Open jianzongwu opened 7 months ago
Hi @jianzongwu, Thanks for your interest in our dataset and sorry the late reply. What is the resolution of the download video? You can set the target resolution here. You can also check the yt-dlp download format here. I think the problem can be solved if you always download "the best video" (i.e., using "bv" tag)
Hello, I tried to set the target resolution to 720p and 360p. The downloaded videos are as the resolution, but both have compressed patches.
How to use the 'bv' tag in the downloading script?
https://github.com/snap-research/Panda-70M/assets/37370789/96d42462-32b8-434a-9fc1-259986d36e1f
I tried again to download a 720p video. The downloaded video is 720x1280, but it is still severely compressed.
It's not the resolution problem. The problem is that the downloaded videos do not look the same as YouTube videos if you look at them online.
I'm wondering if anyone has the same problem as I have.
Hi @jianzongwu,
Could you please try to run this command:
yt-dlp -f "bv[ext=mp4]" "https://www.youtube.com/watch?v=gsnqXt7d1mU"
and see whether the downloaded video has severe compression?
Interestingly, it is not compressed. I followed your command and downloaded the sunrise video with bv.
It successfully gave me a high-quality video with the largest resolution. No compression phenomenon was found.
I also downloaded the 360p version of this video by this command:
yt-dlp -f "wv*[height>=360][ext=mp4]" "https://www.youtube.com/watch?v=gsnqXt7d1mU"
It downloads the 360p video, and also no compression is found.
I notice a difference between my downloading previous times and this time. The compressed videos can not be identified and opened in VSCode. I must download the videos to my local machine and open it with MP4 player. However, the successfully downloaded videos without compression can be opened directly by VSCode.
So, the problem is in the codebase, where it calls yt-dlp and saves the videos?
subsampling: {}
reading:
yt_args:
download_size: 360
download_audio: False
yt_metadata_args:
writesubtitles: False
subtitleslangs: ['en']
writeautomaticsub: False
get_info: False
timeout: 60
sampler: null
storage:
number_sample_per_shard: 100
oom_shard_count: 5
captions_are_subtitles: False
distribution:
processes_count: 32
thread_count: 32
subjob_size: 10000
distributor: "multiprocessing"
This is my config.
Hi @jianzongwu,
So, the problem is in the codebase, where it calls yt-dlp and saves the videos?
Yes, could you please print these lines and try to run yt-dlp command using the same option and see whether you will get the compressed videos? Also, one more thing you can do: you can try to check whether the original video (before splitting) is also compressed. To do that, you can reduce the number of parallel process to 1 and set a breakpoint after the first video is processed (before the original video is deleted).
Hi, I have solved the problem.
It is caused by ffmpeg, it compress the videos when splitting the videos by timestamps.
I did not find the calling place of ffmpeg in the codebase, so I re-write a download script myself based on hf_download.py
and manually set the "-q:v" parameter in ffmpeg to "0" (means no compression) and the extracted frames is as nice as the original video.
Feel free to close this issue.
Hi @jianzongwu,
Great to hear you solved the problem! And thanks for providing the useful information.
Hello, I used the download script to download validation videos but found that they are compressed a lot and, as a result, of low quality. Do you have any idea about this? I guess this may be caused by YouTube compressing the videos when downloading.
The images above are extracted frames and there seem to be many pixel patches, especially on the background
Does anyone have the same issue as I do?