Open doujiang-zheng opened 3 years ago
Hi, I really have no idea, because this Java tool was not developed by me. It took me several days to process the Amazon and Yelp datasets used in our CIKM'20 paper. Owing to the processing speed, I even removed users and items with less than 20 records. Maybe you could read the original documents to see whether there is any solution. Please do let me know if you fixed it. Thanks!
I run experiments on a 24-core Ubuntu server with 128GB memory. The default
reviews_Musical_Instruments_5.json.gz
contains 10K line reviews and costs around 17 minutes on my server. Then I try thereviews_Electronics_5.json.gz
, another category of Amazon datasets with 1 million line reviews. The latter experiment is stuck at the POS (part of speech) stage and has already run 19 hours. I found that the process had many subprocesses but only ran on a single-core. Could you please help me? Thanks for your reading.