Open dpriskorn opened 2 years ago
I can reproduce this. It looks like the same issue would be seen with yt-dlp too, so no easy fix there.
There are pending fixes for DRTV, but I don't think subtitles are affected, so some debugging is needed.
I debugged a little. This request for the master.m3u8 has the link to the vtt playlist with all the segments
curl 'https://drod09h-vh.akamaihd.net/i/all/clear/streaming/5d/61f40be9aa5a612b344e0c5d/Spionkrigen-i-Ringsted_b8e4eadf521344929e67691987d35f10_,500,1100,2000,3500,5500,.mp4.csmil/master.m3u8?cc1=name=Fremmedsprogstekster~default=yes~forced=no~lang=da~uri=https://drod09h-vh.akamaihd.net/p/allx/clear/download/5d/61f40be9aa5a612b344e0c5d/subtitles/Foreign-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1/playlist.m3u8&cc2=name=Dansk~default=no~forced=no~lang=da~uri=https://drod09h-vh.akamaihd.net/p/allx/clear/download/5d/61f40be9aa5a612b344e0c5d/subtitles/Foreign_HardOfHearing-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1/playlist.m3u8' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0' -H 'Accept: */*' -H 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate, br' -H 'Origin: https://www.dr.dk' -H 'Connection: keep-alive' -H 'Referer: https://www.dr.dk/' -H 'Sec-Fetch-Dest: empty' -H 'Sec-Fetch-Mode: cors' -H 'Sec-Fetch-Site: cross-site'
The link is in a comment https://drod09h-vh.akamaihd.net/p/allx/clear/download/5d/61f40be9aa5a612b344e0c5d/subtitles/Foreign-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1/playlist.m3u8 -> -> e.g. segment 6 here has subtitles
So --list-subs
gives
Available subtitles for 00242105010:
Language formats
da vtt, vtt, vtt, vtt, vtt
while --all-subs
just downloads the single dud .vtt.
The extractor looks up the show in https://www.dr.dk/mu-online/api/1.4/programcard. There are three asset
s in the returned programme metadata, each with two subtitle URLs, except the third which only has one.
[[
{
"MimeType": "text/vtt;charset=utf-8",
"Type": "Foreign",
"Uri": "https://drod09h-vh.akamaihd.net/p/allx/clear/download/5d/61f40be9aa5a612b344e0c5d/subtitles/Foreign-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1.vtt",
"Language": "Danish"
},
{
"MimeType": "text/vtt;charset=utf-8",
"Type": "Foreign_HardOfHearing",
"Uri": "https://drod09h-vh.akamaihd.net/p/allx/clear/download/5d/61f40be9aa5a612b344e0c5d/subtitles/Foreign_HardOfHearing-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1.vtt",
"Language": "Danish"
}
],
[{
"MimeType": "text/vtt;charset=utf-8",
"Type": "Foreign",
"Uri": "https://drod04f-vh.akamaihd.net/p/allx/clear/download/89/61f40c63af5a612af86c7e89/subtitles/Foreign-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1.vtt",
"Language": "Danish"
},
{
"MimeType": "text/vtt;charset=utf-8",
"Type": "Foreign_HardOfHearing",
"Uri": "https://drod04f-vh.akamaihd.net/p/allx/clear/download/89/61f40c63af5a612af86c7e89/subtitles/Foreign_HardOfHearing-19324737-6e40ee8a-1569-4676-bdc0-3396bd943dd1.vtt",
"Language": "Danish"
}
],
[{
"MimeType": "text/vtt;charset=utf-8",
"Type": "Foreign_HardOfHearing",
"Uri": "https://drod01e-vh.akamaihd.net/p/allx/clear/download/9e/61fbf808a95a612450c32a9e/subtitles/Foreign_HardOfHearing-19324737-09b133ba-24c7-4b05-a548-a12d9396d3f0.vtt",
"Language": "Danish"
}
]]
This becomes clearer when we look at each asset
:
(Pdb) p assets[0]
{u'Kind': u'VideoResource', u'Target': u'Default', ...
(Pdb) p assets[1]
{u'Kind': u'VideoResource', u'Target': u'SpokenSubtitles', ...
(Pdb) p assets[2]
{u'Kind': u'VideoResource', u'Target': u'SignLanguage', ...
The fifth URL in the list above is selected as the best
if no --sub-format
option was specified, as the subtitle list is supposed to be sorted from least to best preference, and that is the dud subtitle file for the signed version.
There are several problems here:
Target
s that are SpokenSubtitles
, SignLanguage
, or VisuallyInterpreted
, but doesn't distinguish the corresponding subtitles;--all-subs
option doesn't actually download every subtitle listed by --list-subs
(even though the help says it should 'Download all the available subtitles of the video'), but just the preferred one for each language.Looking at the available subtitles:
It's also not obvious why there shouldn't be Danish ForeignHardOfHearing SignLanguage subtitles to allow DSL and non-DSL speakers to watch together.
The extractor could assign a different language code for subtitles extracted from SignLanguage
and VisuallyInterpreted
Target
s, such as sgn-dsl
.
There isn't an official language code that means language X translations from other languages plus original language X. Maybe we could invent dan-da
for this.
--- old/youtube-dl/youtube_dl/extractor/drtv.py
+++ new/youtube-dl/youtube_dl/extractor/drtv.py
@@ -15,6 +15,7 @@
int_or_none,
intlist_to_bytes,
float_or_none,
+ ISO639Utils,
mimetype2ext,
str_or_none,
try_get,
@@ -268,7 +269,12 @@
if not sub_uri:
continue
lang = subs.get('Language') or 'da'
- subtitles.setdefault(LANGS.get(lang, lang), []).append({
+ lang = LANGS.get(lang, lang)
+ if asset_target in ('SignLanguage', 'VisuallyInterpreted'):
+ lang = 'sgn' + ('-dsl' if lang == 'da' else '')
+ elif 'HardOfHearing' in subs.get('Type', ''):
+ lang = '-'.join((ISO639Utils.short2long(lang), lang))
+ subtitles.setdefault(lang, []).append({
'url': sub_uri,
'ext': mimetype2ext(subs.get('MimeType')) or 'vtt'
})
Then with this list output, the Danish Foreign subtitle file can be selected with --sub-lang da
:
Available subtitles for 00242105010:
Language formats
sgn-dsl vtt
dan-da vtt, vtt
da vtt, vtt
Although yt-dlp has a small change in subtitle processing, this issue would also apply there.
Checklist
Verbose log
Description
The downloaded subtitle is practically empty.
The official player has subtitles that can be enabled (the video contains a lot of arabic)