Closed danielegrattarola closed 7 years ago
Hi Daniele,
I cannot re-produce the problem. I downloaded the dataset now via 2 different channels (https://recsys.xing.com and via scp
from the machines that actually host the data) and unzipped the files on Ubuntu 16.04 (using UnZip 6.00) and Mac OS 10.10.5 (using UnZip 5.52) and both works, e.g. on Mac OS:
$ unzip -v
UnZip 5.52 ....
$ unzip data_2017.zip
Archive: data_2017.zip
inflating: interactions.csv
inflating: items.csv
inflating: targetItems.csv
inflating: targetUsers.csv
inflating: users.csv
$ ls -alh
... 1.4G Mar 3 22:53 data_2017.zip
... 8.4G Mar 2 15:50 interactions.csv
... 226M Mar 1 21:29 items.csv
... 341K Mar 2 18:01 targetItems.csv
... 550K Mar 3 16:02 targetUsers.csv
... 82M Mar 1 21:27 users.csv
$ wc -l interactions.csv
322776003 interactions.csv
$ head interactions.csv
recsyschallenge_v2017_interactions_final_anonym_training_export.user_id recsyschallenge_v2017_interactions_final_anonym_training_export.item_id recsyschallenge_v2017_interactions_final_anonym_training_export.interaction_type recsyschallenge_v2017_interactions_final_anonym_training_export.created_at
2082156 80 1 1484299172
1934123 140 1 1486388563
1320213 240 1 1479409825
297303 310 1 1484817366
1635596 310 1 1486370081
857319 340 1 1485121421
324595 350 1 1484591946
510320 350 1 1484841341
499620 390 1 1479387826
I'm not sure why it does not work out for you. @danielegrattarola if the problem remains (e.g. once you tried again to download the data) or if anyone else has the same problem then please let @dkohlsdorf or myself know by commenting in this issue. Thank you!
What you should see once you downloaded the data and unzipped it (unzip data_2017.zip
) are the following files:
File | Size | Number of lines | Description | |
---|---|---|---|---|
interactions.csv | ca. 8.5G | 322776003 | interactions between users and items | |
users.csv | ca. 82M | 1497021 | details about users | |
items.csv | ca. 226M | 1306055 | details about items | |
targetUsers.csv | ca. 550K | 74841 | IDs of users to whom item recommendations can be pushed | |
targetItems.csv | ca. 340K | 46559 | IDs of items for which users (from targetUsers.csv) should be identified that may be interested in the item |
More details about the dataset, see: Dataset description. We will also try to publish some stats about the dataset soon.
Cheers, fabian
Hi Fabian, thanks for the reply, it must be something related to our PCs then. We'll try again and let you know if the problem persist, but I guess we should be able to solve this if it's just on our end.
I'll leave the issue open just in case somebody else has this problem. Thanks again, Daniele
data looks okay for me. I got the same number of lines and file sizes @fabianabel has posted.
md5 sum is 28cbf5dad71582e9a204c43afdd86cfc
Hi,
For me it's OK on Ubuntu but get error with windows 7 (same errors as reported above, interactions.csv (pb with CRC ?).
Best Regards.
Hi,
sorry to open an issue about 5 milliseconds after the competition started, but we tried to download the dataset zip on different PCs and OSs and we found that the file seems to be corrupted somehow. The problem seems to be related to the interactions.csv file, which is cut off at about the 16 millionth line and which weights 400MB once exctracted (1.4 GB compressed).
Is this a problem on our end or does anyone else have this problem?
Thanks, Daniele