m-bain / webvid

Large-scale text-video dataset. 10 million captioned short videos.
597 stars 37 forks source link

support downloading videos with MPI #5

Closed kundaMwiza closed 2 years ago

kundaMwiza commented 2 years ago

For step 3:

To download on one job: python download.py --csv_path results_2M_train.csv --partitions 1 --part 0 --data_dir ./data --processes 8. You can split this across N concurrent jobs by choosing --partitions N partitions and running each job with different --part $idx. You can also specify the number of processes, recommended one per cpu.

users can use MPI to spawn multiple processes, with each process id equaling the partition ID rather than manually setting the partition id. That may be useful

m-bain commented 2 years ago

yea worthy upgrade, thanks!