Closed bryant1410 closed 2 years ago
I'm closing this cause it's not an actual issue but something I just wanted to share.
Btw, happy to see a script was shared! Simplifies users' lifes
Yeah I think my way is defo not the simplest or the fastest :') -- yours is a neat one liner. Im looking into img2dataset atm too.
Yeah, I saw you shared img2dataset. Sounds interesting for starting without even pre-downloading the dataset!
@bryant1410 @m-bain I used your script and scaled it to run simultaneously on 1,000 VMs in a GCP batch job: https://github.com/RyanMarten/distributed_gcp_youtube_download Downloads WebVid10m in 10 minutes
Yeah, I saw you shared img2dataset. Sounds interesting for starting without even pre-downloading the dataset!
Thanks your contribution for quickly downloading videos.
Just saw you released a download script. FWIW, this is what I used to download the 2M version, just wanted to share it. I think it's simpler (it uses csvkit and
parallel
) but maybe it has fewer features:It downloads with 8 jobs in parallel (the flag
-j 8
).