To generate a fake dataset, I've used the following command sequence:
$ cd $WORKSPACE/inference/v0.5/recommendation
$ python -m pip install numpy --user
$ cd tools/
$ ./make_fake_criteo.sh terabyte0875
$ mv fake_criteo/ ../
Here, make_fake_criteo.sh calls quickgen.py which is also in the tools/ directory. Hence, the need to descend there first.
However, all the make_fake_criteo.sh script does is it creates a new directory if it doesn't exist, checks that the only argument is one of the kaggle|terabyte0875|terabyte and passes it to quickgen.py:
Actually, it even gets in the way because it forcibly uses python not python3 (which are different on e.g. Ubuntu 16.04). For example, I didn't have NumPy installed for Python 2, hence had to install it first.
This functionality can be folded into quickgen.py itself.
To generate a fake dataset, I've used the following command sequence:
Here,
make_fake_criteo.sh
callsquickgen.py
which is also in thetools/
directory. Hence, the need to descend there first.However, all the
make_fake_criteo.sh
script does is it creates a new directory if it doesn't exist, checks that the only argument is one of thekaggle|terabyte0875|terabyte
and passes it toquickgen.py
:Actually, it even gets in the way because it forcibly uses
python
notpython3
(which are different on e.g. Ubuntu 16.04). For example, I didn't have NumPy installed for Python 2, hence had to install it first.This functionality can be folded into
quickgen.py
itself.