somestudies / DCCL

25 stars 3 forks source link

yelp dataset #7

Open jhliu0807 opened 1 year ago

jhliu0807 commented 1 year ago

作者您好, 在公开数据集Yelp上,按照文中提到的数据预处理方式,我最终获得了93537个交互数大于10的用户和53347个交互数大于10的物品,以及它们之间的2533759条交互。此外,在过滤前,我已经对交互记录进行了去重处理。 这个结果与文中的统计数据差异巨大,请问可能是什么原因导致了这种差异?可否公开预处理后的yelp数据或数据预处理的代码使得文中的结果可以更方便的复现?

jhliu0807 commented 1 year ago

Hello author, On the public dataset Yelp, according to the data preprocessing method mentioned in the article, I finally got 93537 users with more than 10 interactions and 53347 items with more than 10 interactions, as well as 2533759 interactions between them. In addition, before filtering, I have already deduplication the interaction records. This result is very different from the statistics in the article. What may be the cause of this difference? Can the pre-processed yelp data or data pre-processed code be made public so that the results in the text can be more easily reproduced?

user683 commented 8 months ago

你好,我猜测与原文表里统计的结果差距大的原因是,你们所使用数据的时间段是不一样的,可能作者使用了2年或3年的数据。

user683 commented 8 months ago

您好,请问您复现了吗?因为他代码里输入数据的格式并没有告知,有几个关键文件的数据格式都不知道是怎么样的。 1704379915308