Open SCZwangxiao opened 11 months ago
I've found the root cause of this problem.
The root cause is that decord
library does not support AV1
codec currently, see issue here https://github.com/dmlc/decord/issues/221.
For a temporary fix:
video_format_string = (
f"wv*[height>={self.video_size}][ext=mp4][vcodec!=av01.0.01M.08]/"
f"w[height>={self.video_size}][ext=mp4][vcodec!=av01.0.01M.08]/"
f"bv/b[ext=mp4][vcodec!=av01.0.01M.08]"
)
Note that neither [codec=avc1]
or [vcodec=avc1]
works, because the yt-dlp
says Requested format is not available
even if the format does exist. The format string will fall back to "bv/b[ext=mp4][codec=avc1]"
, makeing the results quite large.
Finally, I think the fundamental solution is to add a new feature to pass user-defined format_string
.
Take the youtube video 6EAhKcpVtFA as an example, it has the following formats (simplified):
% yt-dlp -F 6EAhKcpVtFA
[youtube] Extracting URL: 6EAhKcpVtFA
[youtube] 6EAhKcpVtFA: Downloading webpage
[youtube] 6EAhKcpVtFA: Downloading ios player API JSON
[youtube] 6EAhKcpVtFA: Downloading android player API JSON
[youtube] 6EAhKcpVtFA: Downloading m3u8 information
[info] Available formats for 6EAhKcpVtFA:
ID EXT RESOLUTION FPS CH │ FILESIZE TBR PROTO │ VCODEC VBR ACODEC ABR ASR MORE INFO
───────────────────────────────────────────────────────────────────────────────────────────────────────────────
395 mp4 426x240 30 │ 5.98MiB 172k https │ av01.0.00M.08 172k video only 240p, mp4_dash
229 mp4 426x240 30 │ ~10.77MiB 303k m3u8 │ avc1.4D4015 303k video only
133 mp4 426x240 30 │ 5.02MiB 145k https │ avc1.4D4015 145k video only 240p, mp4_dash
604 mp4 426x240 30 │ ~10.18MiB 287k m3u8 │ vp09.00.20.08 287k video only
242 webm 426x240 30 │ 6.65MiB 192k https │ vp09.00.20.08 192k video only 240p, webm_dash
396 mp4 640x360 30 │ 11.80MiB 341k https │ av01.0.01M.08 341k video only 360p, mp4_dash
230 mp4 640x360 30 │ ~25.98MiB 731k m3u8 │ avc1.4D401E 731k video only
134 mp4 640x360 30 │ 12.32MiB 356k https │ avc1.4D401E 356k video only 360p, mp4_dash
18 mp4 640x360 30 2 │ ≈17.13MiB 482k https │ avc1.42001E mp4a.40.2 44k 360p
605 mp4 640x360 30 │ ~20.30MiB 572k m3u8 │ vp09.00.21.08 572k video only
243 webm 640x360 30 │ 12.43MiB 359k https │ vp09.00.21.08 359k video only 360p, webm_dash
397 mp4 854x480 30 │ 17.88MiB 516k https │ av01.0.04M.08 516k video only 480p, mp4_dash
231 mp4 854x480 30 │ ~44.13MiB 1242k m3u8 │ avc1.4D401F 1242k video only
135 mp4 854x480 30 │ 23.18MiB 669k https │ avc1.4D401F 669k video only 480p, mp4_dash
606 mp4 854x480 30 │ ~33.34MiB 939k m3u8 │ vp09.00.30.08 939k video only
244 webm 854x480 30 │ 22.08MiB 637k https │ vp09.00.30.08 637k video only 480p, webm_dash
Format 396
hit the first rule "wv*[height>=360][ext=mp4]"
in https://github.com/iejMac/video2dataset/blob/2a9071d5fef42ceaa25c0e029c27389e50b098c2/video2dataset/data_reader.py#L180
Ah thanks very much! Yes I do remember running into this a few times. And agreed, best solution is probably to parameterize the codec arg.
For some download videos (around 1/30 in my crawled YouTube dataset), they cannot be loaded by Python
decord
package, there will be an error:These videos are also unplayable using the default video player in MacOS.
However, these videos can be loaded using
PyAV
, and be played by Jupyter Notebook, which is so strange!I've noticed that there is an under-developped feature
self.specify_codec
inYtDlpDownloader
. The comments say itwas relevant with HD videos for loading with decord
. Is it related to my issue? https://github.com/iejMac/video2dataset/blob/2a9071d5fef42ceaa25c0e029c27389e50b098c2/video2dataset/data_reader.py#L169-L176