Open qihao067 opened 4 months ago
I think this is the normal behavior of the upstream video2dataset tool as well: that is the error recorded in the stats.json file no matter what was the reason for failed download. It is something that I'd like to see improved to work more like the img2dataset counterpart, to record the actual failure reason like HTTP 403 etc.
With large datasets using public sources like Youtube, 100% success rate is not really possible. When you stop the download, some temporary files are left behind. Since they use an unique name based on UUID, I don't think they will be reused, so you can clean them up before starting a new attempt.
Thanks for the swift response. But the problem is, if the code finds the videos downloaded and tries to find them in the /tmp file, then even it fails, it will not download the video again. (Not sure)
Specifically, the first time I downloaded the video, everything was fine. But in the second time, most of the video were not downloaded. The data file looks like this:
This is the json file:
Is it possible that my IP has been rate-limited by YouTube?
I encountered the same problem, have you solved it?
Me too. All the files downloaded are json and txt
Thanks for the swift response. But the problem is, if the code finds the videos downloaded and tries to find them in the /tmp file, then even it fails, it will not download the video again. (Not sure)
Specifically, the first time I downloaded the video, everything was fine. But in the second time, most of the video were not downloaded. The data file looks like this:
![]()
This is the json file:
![]()
Hi, I encountered the same problem, how you solved it?
Same error here. Once the error happens, the corresponding shard will stop downloading and the program hangs there. Is there any solution @tsaishien-chen ? Thank you.
Hi @lzhangbj, Thanks for your interest on Panda-70M dataset! I am guessing your server's IP might be blocked. Where is your server? When you click the html in the CSV file, can you access to the video source? have you tried to download the videos with different proxy/IP?
The first time I ran the code, it worked fine, but it was slow, so I stopped and changed the 'processes_count' in the cfg file. However, when I executed the video2dataset command for the second time, it started to give me this error, and these videos were skipped, with only the JSON and txt files being saved. I believe these videos were downloaded the first time I ran the code, but I cleaned the tmp file. I cannot find the code to redownload these videos. Can you help me fix this? Thanks
error:
[Errno 2] No such file or directory: '/tmp/a7191cb8-d948-48b1-ba70-1e890953518b.mp4' [Errno 2] No such file or directory: '/tmp/446b7a87-a878-4b06-8df0-88facffb3d24.mp4' [Errno 2] No such file or directory: '/tmp/a866306e-8405-428b-9455-3584856f5fbb.mp4' [Errno 2] No such file or directory: '/tmp/2e93b9f5-a683-4b09-b2a7-13631b7def87.mp4' [Errno 2] No such file or directory: '/tmp/609b1ad1-4eee-4e0e-a6e1-91eb853d95b8.mp4' .....