Closed VanderHua closed 3 months ago
Hi @VanderHua, Thanks for bringing this into attention! I am working on the support of audio downloading. The code will be published tomorrow.
Great update! Thanks!
Has this issue been resolved?
Hi @kuno989, Yes, the current codebase will download the video with audio by default. You can change the config file here to download videos only if you want.
Thanks @tsaishien-chen, I'll check it out!
HI @tsaishien-chen . I used the latest update version to work on it, but I still found that all the videos didn't contain sound. Has anyone faced the same problem as me?
Below is the config.yaml I used.
subsampling: {}
reading:
yt_args:
download_size: 360
download_audio: True
yt_metadata_args:
writesubtitles: True
subtitleslangs: ['en']
writeautomaticsub: True
get_info: True
timeout: 60
sampler: null
storage:
number_sample_per_shard: 100
oom_shard_count: 5
captions_are_subtitles: False
distribution:
processes_count: 32
thread_count: 32
subjob_size: 10000
distributor: "pyspark"
Hi @kuno989,
Could you please check these line. +ba
is the command option to download video with audio. Could you also check whether this option is activated during downloading?
Thank you for your answer. Yes, I checked it, but there's no difference.
@kuno989, could you print video_format_string
during the downloading and see whether the string includes +ba
?
Yes, I checked that it went in normally. That's weird.
print("video_format_string", video_format_string)
video_format_string wv*[height>=360][ext=mp4]+ba/w[height>=360][ext=mp4]+ba/bv/b[ext=mp4]+ba
@kuno989, could you please try to also add +ba
after bv
? It could be a bug.
The IP of the instance I'm using was blocked, so it took some time. I modified it as below, but the video still doesn't include sound.
video_format_string = (
f"wv*[height>={self.video_size}][ext=mp4]{'[codec=avc1]' if self.specify_codec else ''}{'+ba+bv' if self.download_audio else ''}/"
f"w[height>={self.video_size}][ext=mp4]{'[codec=avc1]' if self.specify_codec else ''}{'+ba+bv' if self.download_audio else ''}/"
f"bv/b[ext=mp4]{'[codec=avc1]' if self.specify_codec else ''}{'+ba+bv' if self.download_audio else ''}"
)
No, please try:
video_format_string = (
f"wv*[height>={self.video_size}][ext=mp4]{'[codec=avc1]' if self.specify_codec else ''}{'+ba' if self.download_audio else ''}/"
f"w[height>={self.video_size}][ext=mp4]{'[codec=avc1]' if self.specify_codec else ''}{'+ba' if self.download_audio else ''}/"
f"bv[ext=mp4]{'[codec=avc1]' if self.specify_codec else ''}{'+ba' if self.download_audio else ''}/"
f"b[ext=mp4]{'[codec=avc1]' if self.specify_codec else ''}{'+ba' if self.download_audio else ''}"
)
I've confirmed that it's working now! Maybe the code hasn't been updated or anything else. Thanks!
Audio is now available for download.
Is the mp4 data downloaded through the script silent? I downloaded part of the training data through the provided script and found that the mp4 files were all silent. I confirmed that there was no problem with my device player.