Open himslm01 opened 2 years ago
Doesn't --write-auto-sub
do this? Or you need to explain further.
Also see https://github.com/ytdl-org/youtube-dl/issues/30667#issuecomment-1048703724.
The "english" and "english (cc)" subs have same language code. So either one may be downloaded depending on the order youtube-dl processes them in, without the user being able to select b/w them
❯ youtube-dl --list-subs 0Zp6IdV9fFo
[youtube] 0Zp6IdV9fFo: Downloading webpage
Available subtitles for 0Zp6IdV9fFo:
Language formats
yue-HK vtt, ttml, srv3, srv2, srv1
th vtt, ttml, srv3, srv2, srv1
my vtt, ttml, srv3, srv2, srv1
zh-TW vtt, ttml, srv3, srv2, srv1
ar vtt, ttml, srv3, srv2, srv1
en-GB vtt, ttml, srv3, srv2, srv1
fil vtt, ttml, srv3, srv2, srv1
zh-Hans vtt, ttml, srv3, srv2, srv1
tr vtt, ttml, srv3, srv2, srv1
ko vtt, ttml, srv3, srv2, srv1
vi vtt, ttml, srv3, srv2, srv1
ru vtt, ttml, srv3, srv2, srv1
fr vtt, ttml, srv3, srv2, srv1
hi vtt, ttml, srv3, srv2, srv1
pt-BR vtt, ttml, srv3, srv2, srv1
ro vtt, ttml, srv3, srv2, srv1
ja vtt, ttml, srv3, srv2, srv1
uk vtt, ttml, srv3, srv2, srv1
es-419 vtt, ttml, srv3, srv2, srv1
hr vtt, ttml, srv3, srv2, srv1
id vtt, ttml, srv3, srv2, srv1
❯ yt-dlp --list-subs 0Zp6IdV9fFo
[youtube] 0Zp6IdV9fFo: Downloading webpage
[youtube] 0Zp6IdV9fFo: Downloading android player API JSON
[info] Available automatic captions for 0Zp6IdV9fFo:
Language Name Formats
<long list of translated subtitles>
...
[info] Available subtitles for 0Zp6IdV9fFo:
Language Name Formats
ar Arabic vtt, ttml, srv3, srv2, srv1, json3
ar-PwUA2SMt9rM Arabic - closed captions vtt, ttml, srv3, srv2, srv1, json3
my Burmese vtt, ttml, srv3, srv2, srv1, json3
my-PwUA2SMt9rM Burmese - closed captions vtt, ttml, srv3, srv2, srv1, json3
yue-HK Cantonese (Hong Kong) vtt, ttml, srv3, srv2, srv1, json3
yue-HK-PwUA2SMt9rM Cantonese (Hong Kong) - closed captions vtt, ttml, srv3, srv2, srv1, json3
zh-Hans Chinese (Simplified) vtt, ttml, srv3, srv2, srv1, json3
zh-Hans-PwUA2SMt9rM Chinese (Simplified) - closed captions vtt, ttml, srv3, srv2, srv1, json3
zh-TW Chinese (Taiwan) vtt, ttml, srv3, srv2, srv1, json3
zh-TW-PwUA2SMt9rM Chinese (Taiwan) - closed captions vtt, ttml, srv3, srv2, srv1, json3
hr Croatian vtt, ttml, srv3, srv2, srv1, json3
hr-PwUA2SMt9rM Croatian - closed captions vtt, ttml, srv3, srv2, srv1, json3
en-GB English (United Kingdom) vtt, ttml, srv3, srv2, srv1, json3
en-GB-PwUA2SMt9rM English (United Kingdom) - closed captions vtt, ttml, srv3, srv2, srv1, json3
fil Filipino vtt, ttml, srv3, srv2, srv1, json3
fil-PwUA2SMt9rM Filipino - closed captions vtt, ttml, srv3, srv2, srv1, json3
fr French vtt, ttml, srv3, srv2, srv1, json3
fr-PwUA2SMt9rM French - closed captions vtt, ttml, srv3, srv2, srv1, json3
hi Hindi vtt, ttml, srv3, srv2, srv1, json3
hi-PwUA2SMt9rM Hindi - closed captions vtt, ttml, srv3, srv2, srv1, json3
id Indonesian vtt, ttml, srv3, srv2, srv1, json3
id-PwUA2SMt9rM Indonesian - closed captions vtt, ttml, srv3, srv2, srv1, json3
ja Japanese vtt, ttml, srv3, srv2, srv1, json3
ja-PwUA2SMt9rM Japanese - closed captions vtt, ttml, srv3, srv2, srv1, json3
ko Korean vtt, ttml, srv3, srv2, srv1, json3
ko-PwUA2SMt9rM Korean - closed captions vtt, ttml, srv3, srv2, srv1, json3
pt-BR Portuguese (Brazil) vtt, ttml, srv3, srv2, srv1, json3
pt-BR-PwUA2SMt9rM Portuguese (Brazil) - closed captions vtt, ttml, srv3, srv2, srv1, json3
ro Romanian vtt, ttml, srv3, srv2, srv1, json3
ro-PwUA2SMt9rM Romanian - closed captions vtt, ttml, srv3, srv2, srv1, json3
ru Russian vtt, ttml, srv3, srv2, srv1, json3
ru-PwUA2SMt9rM Russian - closed captions vtt, ttml, srv3, srv2, srv1, json3
es-419 Spanish (Latin America) vtt, ttml, srv3, srv2, srv1, json3
es-419-PwUA2SMt9rM Spanish (Latin America) - closed captions vtt, ttml, srv3, srv2, srv1, json3
th Thai vtt, ttml, srv3, srv2, srv1, json3
th-PwUA2SMt9rM Thai - closed captions vtt, ttml, srv3, srv2, srv1, json3
tr Turkish vtt, ttml, srv3, srv2, srv1, json3
uk Ukrainian vtt, ttml, srv3, srv2, srv1, json3
uk-PwUA2SMt9rM Ukrainian - closed captions vtt, ttml, srv3, srv2, srv1, json3
vi Vietnamese vtt, ttml, srv3, srv2, srv1, json3
vi-PwUA2SMt9rM Vietnamese - closed captions vtt, ttml, srv3, srv2, srv1, json3
OK, and --all-subs
doesn't help in yt-dl because it just gets one sub for each language, IIRC.
-PwUA2SMt9rM
? Wouldn't -CC
be better?
that's what youtube gives in languageCode
field. This doesn't just happen with cc. I have seen videos with "descriptive subtitles". There could be other cases as well. Mapping them all to human readable ids could be tricky
If -CC-
was interpolated (eg zh-TW-CC-PwUA2SMt9rM
) for closed captions
a longest match would be able to select that over the default, or a regex match could select between them.
Checklist
Verbose log
Description
There are videos which include "closed captions" and translation captions with the same language - such as that above, as seen in this image.
There does not seem to be a command line switch to download the translation instead of the "closed captions".
As seen in the debug logs, the URLs to download, in this instance, the timedtext vtt subtitles include the parameter and value
name=closed+captions
.When I curl the URL of the VTT caption which yutube-dl has shown me I can see the "closed caption"s. When I curl the same URL but with the
name=closed+captions
parameter and value removed I see the translation caption which I want.The extension to this thought is that
--list-subs
must list not only the language but also the name of each subtitle track, and allow multiple track names per language. This might also alter the way--write-auto-sub
works.The closed PRs https://github.com/yt-dlp/yt-dlp/pull/310 and https://github.com/ytdl-org/youtube-dl/pull/26112 appear to be attempts to fix this issue.
To work around this issue I must look at the URL that youtube-dl is downloading for the subtitle file, edit the URL, curl the URL to a file, merge the subtitles with the video youtube-dl created, delete the files youtube-dl created, and rename my merged version.