snap-research / Panda-70M

[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
https://snap-research.github.io/Panda-70M/
438 stars 15 forks source link

Single Instance? #27

Closed kuno989 closed 3 months ago

kuno989 commented 3 months ago

Did you download it in a single instance or did you download it in parallel using spark or something? There are many ways to go with video2dataset, and I'd be interested in your thoughts before I try to download it.

tsaishien-chen commented 3 months ago

Hi @kuno989, Like this line in the config file, the current codebase uses Python in-built multiprocessing for parallel downloading. Video2dataset tool also supports different ways to achieve parallel downloading. Please can check here. Hope this answer your question!