where to get the data? - Githubissues

SeekPoint commented 7 years ago

rzai@rzai00:~/prj/GRU4Rec/examples/rsc15$ python run_rsc15.py Using gpu device 0: GeForce GTX 1080 (CNMeM is disabled, cuDNN 5105) Traceback (most recent call last): File "run_rsc15.py", line 20, in data = pd.read_csv(PATH_TO_TRAIN, sep='\t', dtype={'ItemId':np.int64}) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 470, in parser_f return _read(filepath_or_buffer, kwds) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 246, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 562, in init self._make_engine(self.engine) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 699, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1066, in init self._reader = _parser.TextReader(src, **kwds) File "pandas/parser.pyx", line 350, in pandas.parser.TextReader.cinit (pandas/parser.c:3163) File "pandas/parser.pyx", line 583, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5779) IOError: File /path/to/rsc15_train_full.txt does not exist rzai@rzai00:~/prj/GRU4Rec/examples/rsc15$

hidasib commented 7 years ago

http://bit.ly/2hv5UGQ

loretoparisi commented 7 years ago

Files are here: http://2015.recsyschallenge.com/challenge.html so

curl -Lo yoochoose-data.7z https://s3-eu-west-1.amazonaws.com/yc-rdata/yoochoose-data.7z
7z x yoochoose-data.7z

7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=C,Utf16=off,HugeFiles=on,8 CPUs)

Processing archive: yoochoose-data.7z

Extracting  yoochoose-buys.dat
Extracting  yoochoose-clicks.dat
Extracting  yoochoose-test.dat
Extracting  dataset-README.txt

Everything is Ok

Files: 4
Size:       1914111754
Compressed: 287211932

where training file are yoochoose-clicks.dat and yoochoose-buys.dat, while yoochoose-test.dat is the test file.

Now in the scripts we have

PATH_TO_TRAIN = '/path/to/rsc15_train_full.txt'
PATH_TO_TEST = '/path/to/rsc15_test.txt'

I'm not completely sure of the training and test files here considering the available dataset @hidasib

hidasib commented 7 years ago

... https://github.com/hidasib/GRU4Rec/blob/master/examples/rsc15/preprocess.py

loretoparisi commented 7 years ago

@hidasib ok thank you I was able to pre-process the dataset

root@d842fc00a358:~/GRU4Rec/examples/rsc15# python preprocess.py 

Full train set
    Events: 31637239
    Sessions: 7966257
    Items: 37483
Test set
    Events: 71222
    Sessions: 15324
    Items: 6751
Train set
    Events: 31579006
    Sessions: 7953885
    Items: 37483
Validation set
    Events: 58233
    Sessions: 12372
    Items: 6359

hidasib / GRU4Rec

where to get the data? #6