Open vedantroy opened 7 months ago
Hi @vedantroy, Thanks for your interest about this dataset! Unfortunately, this is a quite common issue. You can check some discussions like this one. The best solution is: use VPN and get different IPs once you detect your IP is banned. If you don't have a VPN, you can try to slow down the download speed by reducing processes_count and thread_count in the config file and also set a sleep counter after a few downloading steps. Hope this information is helpful!
Hi @vedantroy, Thanks for your interest about this dataset! Unfortunately, this is a quite common issue. You can check some discussions like this one. The best solution is: use VPN and get different IPs once you detect your IP is banned. If you don't have a VPN, you can try to slow down the download speed by reducing processes_count and thread_count in the config file and also set a sleep counter after a few downloading steps. Hope this information is helpful!
@tsaishien-chen I have been troubled by this IP block issue for quite some time. Is there a template available for implementing a 'sleep counter' after a few download steps?
I wrote a downloader using youtube-dlp, but a lot of the IPs get blocked after ~ 10K or so downloads. I'm surprised people are successfully downloading the dataset using the provided downloading script on a single machine, as I would strongly expect YouTube to block after a few gigabytes of data are downloaded.
Are there any proxies / tools / tricks used to download the entire dataset and avoid Youtube blocking?