To download on one job: python download.py --csv_path results_2M_train.csv --partitions 1 --part 0 --data_dir ./data --processes 8. You can split this across N concurrent jobs by choosing --partitions N partitions and running each job with different --part $idx. You can also specify the number of processes, recommended one per cpu.
users can use MPI to spawn multiple processes, with each process id equaling the partition ID rather than manually setting the partition id. That may be useful
For step 3:
To download on one job:
python download.py --csv_path results_2M_train.csv --partitions 1 --part 0 --data_dir ./data --processes 8
. You can split this across N concurrent jobs by choosing--partitions N
partitions and running each job with different--part $idx
. You can also specify the number of processes, recommended one per cpu.users can use MPI to spawn multiple processes, with each process id equaling the partition ID rather than manually setting the partition id. That may be useful