Closed lfer94 closed 2 years ago
yt-dl has the same code at l.2170 of extractor/common.py
vs l.2736 in yt-dlp. We're just after the extract_multisegment_info()
function inside the _parse_mpd_formats_and_subtitles()
method (yt-dl: _parse_mpd_formats()
):
period_duration = parse_duration(period.get('duration')) or mpd_duration
... [100/150 lines]
if 'segment_urls' not in representation_ms_info and 'media' in representation_ms_info:
... [2 lines]
# As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$
# can't be used at the same time
if '%(Number' in media_template and 's' not in representation_ms_info:
... [3 lines, then failing line below]
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
...
else:
# $Number*$ or $Time$ in media template with S list available
... [code in else block that doesn't use period_duration]
So either this MPD is invalid, or a test for period_duration is not None
should be added to the first listed if
condition, and possibly another branch added to the elif
chain if none of the existing branches handle the syntax in this MPD.
I'd test this, but Akamai is giving me 403 on the MPD with a blocked IP notice in the browser that links to their non-functional client IP reputation tool.
I'd test this, but Akamai is giving me 403 on the MPD with a blocked IP notice in the browser that links to their non-functional client IP reputation tool.
Sorry, I forgot to say that this service is geoblocked and it's only available for Argentina users
In that case it might help if you could acquire the problem manifest (curl or similar) and attach it to the issue.
I added a hyperlink in the "manifest" word with this link https://mega.nz/file/uupQgSwT#Zih2rfBkG0wYPZK4P-tR4yjU2VbckPwomNfQDeA-dwo that contains the mpd file. Is that enough?
As for the curl thing, you need me to go to the browser and copy the manifest as curl, right?
I'm sorry, it's just that some of the things you said are advanced for me 🙈
I've put a copy here.
Unrelated, but the <ContentProtection>
elements don't bode well for the usability of the stream.
But we do have this element and there isn't a @duration
:
<Period id="1" start="PT0S">
OP may wish to look away now. Basically the code is doing something that appears impossible (until it's run in the debugger) and that leads to the crash.
The DASH reference in the code is obsolete, both the link and the document: this is the current standard Information technology — Dynamic adaptive streaming over HTTP (DASH) — Part 1: Media presentation description and segment formats.
According to section 5.3.2.1,
- If the attribute @start is present in the Period, then the Period is a regular Period or an early
terminated Period and the PeriodStart is equal to the value of this attribute.
...
... The Period extends until the PeriodStart
of the next Period, or until the end of the Media Presentation in the case of the last Period. For regular
Periods, the difference between the PeriodStart time of a Period and either the PeriodStart time of the
following Period, if this is not the last Period, or the value of the MPD@mediaPresentationDuration
if this is the last one, is the presentation duration in Media Presentation time of the media content
represented by the Representations in this Period. For Early Terminated Periods, the value of the
Period@duration is the presentation duration in Media Presentation time of the media content
represented by the Representations in this Period.
So this must be an Early Terminated Period
as it has a @start
but no @duration
and it's the only one, so the last, but no MPD@mediaPresentationDuration
is present. In this case the duration is the sum of the @d
values of the S
elements in the representation which are in units of 1/SegmentTemplate@timescale
seconds, but it doesn't have to be calculated explicitly.
The failure occurred because somehow the conditional branch for a $Number$
template was taken when it was, at least if it hasn't changed, actually "1125_FOX__SPORTS_PREMIUM_ARG-$RepresentationID$-$Time$.dash"
. The branch # $Number*$ or $Time$ in media template with S list available
should have been taken, which doesn't need a duration.
Apparently, a similar problem, NoneType now gets into progress:
2022-02-09 19:04:34.968 TYoutubeDl yt-dlp stack: Pyton APPDownload core loading C:\Program Files (x86)\APP\Client\mod_a4ea4b5a\yt-dlp\yt-dlp 2022-02-09 19:04:34.988 TYoutubeDl yt-dlp stack: Pyton APPDownload core loaded <module 'yt_dlp' from 'yt-dlp\yt_dlp\init.py'> 2022-02-09 19:04:35.001 TYoutubeDl yt-dlp stack: Pyton APPDownload object created 2022-02-09 19:04:35.495 TYoutubeDl yt-dlp stack: web_request REDIRECT: https://twitter.com/ProZD/status/1248431798016565249 2022-02-09 19:04:35.508 TYoutubeDl yt-dlp stack: INFO: get_url_domain (offline parser) -> "twitter.com" 2022-02-09 19:04:35.522 TYoutubeDl yt-dlp stack: INFO: fill_auth_data -> Url domain "twitter.com" 2022-02-09 19:04:35.536 TYoutubeDl yt-dlp stack: INFO: fill_auth_data -> Authorization data for "twitter.com" not found in cache "C:\Users\Ra10Bit-PC\AppData\Roaming\APP\Data\cache". SKIPPED 2022-02-09 19:04:35.549 TYoutubeDl yt-dlp stack: ..yt-dlp GetDownload OPTIONS: {'format': 'http-2176', 'outtmpl': 'C:\Users\Ra10Bit-PC\Downloads\APP Downloads\tmp\SungWon Cho - twitter gave me six characters to voice [480p].mp4', 'socket_timeout': '15', 'verbose': 'False', 'retries': 10, 'nocheckcer 2022-02-09 19:04:35.562 TYoutubeDl yt-dlp stack: tificate': True, 'keepvideo': True, 'ffmpeg_location': 'C:\Program Files (x86)\APP\Client\mod_a4ea4b5a\', 'ignoreerrors': True, 'logger': <main.APPDownload.getDownload.
.loggerBase object at 0x06368530>, 'progress_hooks': [<function APPDownl 2022-02-09 19:04:35.575 TYoutubeDl yt-dlp stack: oad.getDownload. .my_hook at 0x0A10F2B8>], 'no_color': 'True', 'encoding': 'utf-8'} 2022-02-09 19:04:35.627 TYoutubeDl yt-dlp stack: DEBUG: taskid=3475612=taskid Removing cache dir C:\Users\Ra10Bit-PC/.cache\yt-dlp . 2022-02-09 19:04:35.640 TYoutubeDl yt-dlp stack: DEBUG: taskid=3475612=taskid . 2022-02-09 19:04:35.656 TYoutubeDl yt-dlp stack: DEBUG: taskid=3475612=taskid [debug] [twitter] Extracting URL: https://twitter.com/ProZD/status/1248431798016565249 2022-02-09 19:04:35.669 TYoutubeDl yt-dlp stack: DEBUG: taskid=3475612=taskid [twitter] 1248431798016565249: Downloading guest token 2022-02-09 19:04:35.993 TYoutubeDl yt-dlp stack: DEBUG: taskid=3475612=taskid [twitter] 1248431798016565249: Downloading JSON metadata 2022-02-09 19:04:36.334 TYoutubeDl yt-dlp stack: DEBUG: taskid=3475612=taskid [twitter] 1248431798016565249: Downloading m3u8 information 2022-02-09 19:04:36.591 TYoutubeDl yt-dlp stack: DEBUG: taskid=3475612=taskid [debug] Sort order given by extractor: res, br, size, proto 2022-02-09 19:04:36.604 TYoutubeDl yt-dlp stack: DEBUG: taskid=3475612=taskid [debug] Formats sorted by: hasvid, ie_pref, res, tbr, vbr, abr, filesize, fs_approx, proto, lang, quality, fps, hdr:12(7), vcodec:vp9.2(10), acodec, asr, vext, aext, hasaud, source, id 2022-02-09 19:04:36.622 TYoutubeDl yt-dlp stack: DEBUG: taskid=3475612=taskid [info] 1248431798016565249: Downloading 1 format(s): http-2176 2022-02-09 19:04:36.641 TYoutubeDl yt-dlp stack: DEBUG: taskid=3475612=taskid [debug] Invoking downloader on "https://video.twimg.com/ext_tw_video/1248431448568127489/pu/vid/1280x720/SB1w0WRRmPTHJWiE.mp4?tag=10" 2022-02-09 19:04:37.244 TYoutubeDl yt-dlp stack: DEBUG: taskid=3475612=taskid [download] Destination: C:\Users\Ra10Bit-PC\Downloads\APP Downloads\tmp\SungWon Cho - twitter gave me six characters to voice [480p].mp4 2022-02-09 19:04:37.257 TYoutubeDl yt-dlp stack: DEBUG: taskid=3475612=taskid [download] 1.00KiB at 71.54KiB/s (00:00) 2022-02-09 19:04:37.271 TYoutubeDl yt-dlp stack: downloading 2022-02-09 19:04:37.290 TYoutubeDl yt-dlp stack: ERR: taskid=3475612=taskid Err: float() argument must be a string or a number, not 'nonetype' 2022-02-09 19:04:37.303 TYoutubeDl yt-dlp stack: ERR: taskid=3475612=taskid traceback (most recent call last): 2022-02-09 19:04:37.315 TYoutubeDl yt-dlp stack: file "yt-dlp\yt_dlp\youtubedl.py", line 1381, in wrapper 2022-02-09 19:04:37.329 TYoutubeDl yt-dlp stack: file "yt-dlp\yt_dlp\youtubedl.py", line 1465, in __extract_info 2022-02-09 19:04:37.342 TYoutubeDl yt-dlp stack: file "yt-dlp\yt_dlp\youtubedl.py", line 1517, in process_ie_result 2022-02-09 19:04:37.355 TYoutubeDl yt-dlp stack: file "yt-dlp\yt_dlp\youtubedl.py", line 2607, in process_video_result 2022-02-09 19:04:37.368 TYoutubeDl yt-dlp stack: file "yt-dlp\yt_dlp\youtubedl.py", line 3086, in process_info 2022-02-09 19:04:37.381 TYoutubeDl yt-dlp stack: file "yt-dlp\yt_dlp\youtubedl.py", line 2801, in dl 2022-02-09 19:04:37.394 TYoutubeDl yt-dlp stack: file "yt-dlp\yt_dlp\downloader\common.py", line 440, in download 2022-02-09 19:04:37.408 TYoutubeDl yt-dlp stack: file "yt-dlp\yt_dlp\downloader\http.py", line 372, in real_download 2022-02-09 19:04:37.422 TYoutubeDl yt-dlp stack: file "yt-dlp\yt_dlp\downloader\http.py", line 317, in download 2022-02-09 19:04:37.434 TYoutubeDl yt-dlp stack: file "yt-dlp\yt_dlp\downloader\common.py", line 456, in _hook_progress 2022-02-09 19:04:37.447 TYoutubeDl yt-dlp stack: file " ", line 4310, in my_hook 2022-02-09 19:04:37.460 TYoutubeDl yt-dlp stack: typeErr: float() argument must be a string or a number, not 'nonetype' 2022-02-09 19:04:37.476 TYoutubeDl yt-dlp stack: 2022-02-09 19:04:37.944 TYoutubeDl yt-dlp stack: web_request REDIRECT: https://twitter.com/ProZD/status/1248431798016565249 2022-02-09 19:04:37.958 TYoutubeDl yt-dlp stack: INFO: get_url_domain (offline parser) -> "twitter.com" 2022-02-09 19:04:37.972 TYoutubeDl yt-dlp stack: INFO: auth_msg_register -> Url domain "twitter.com" 2022-02-09 19:04:37.986 TYoutubeDl yt-dlp stack: INFO: auth_msg_register -> Authorization data not required for "twitter.com" or already filled 2022-02-09 19:04:37.999 TYoutubeDl yt-dlp stack: ERROR: taskid=3475612=taskid ERROR: float() argument must be a string or a number, not 'NoneType' 2022-02-09 19:04:38.011 TYoutubeDl yt-dlp stack: Traceback (most recent call last): 2022-02-09 19:04:38.025 Updating task status in DB procedure started. 2022-02-09 19:04:38.038 Status will be changed to "Error". 2022-02-09 19:04:38.051 TYoutubeDl yt-dlp stack: File "yt-dlp\yt_dlp\YoutubeDL.py", line 1381, in wrapper 2022-02-09 19:04:38.064 Updating task status in DB procedure finished. 2022-02-09 19:04:38.077 TYoutubeDl yt-dlp stack: File "yt-dlp\yt_dlp\YoutubeDL.py", line 1465, in __extract_info 2022-02-09 19:04:38.090 TYoutubeDl yt-dlp stack: File "yt-dlp\yt_dlp\YoutubeDL.py", line 1517, in process_ie_result 2022-02-09 19:04:38.103 TYoutubeDl yt-dlp stack: File "yt-dlp\yt_dlp\YoutubeDL.py", line 2607, in process_video_result 2022-02-09 19:04:38.116 TYoutubeDl yt-dlp stack: File "yt-dlp\yt_dlp\YoutubeDL.py", line 3086, in process_info 2022-02-09 19:04:38.129 TYoutubeDl yt-dlp stack: File "yt-dlp\yt_dlp\YoutubeDL.py", line 2801, in dl 2022-02-09 19:04:38.141 TYoutubeDl yt-dlp stack: File "yt-dlp\yt_dlp\downloader\common.py", line 440, in download 2022-02-09 19:04:38.154 TYoutubeDl yt-dlp stack: File "yt-dlp\yt_dlp\downloader\http.py", line 372, in real_download 2022-02-09 19:04:38.167 TYoutubeDl yt-dlp stack: File "yt-dlp\yt_dlp\downloader\http.py", line 317, in download 2022-02-09 19:04:38.180 TYoutubeDl yt-dlp stack: File "yt-dlp\yt_dlp\downloader\common.py", line 456, in _hook_progress 2022-02-09 19:04:38.193 TYoutubeDl yt-dlp stack: File " ", line 4310, in my_hook 2022-02-09 19:04:38.206 TYoutubeDl yt-dlp stack: TypeError: float() argument must be a string or a number, not 'NoneType'
I've put a copy here.
Unrelated, but the
<ContentProtection>
elements don't bode well for the usability of the stream.But we do have this element and there isn't a
@duration
:<Period id="1" start="PT0S">
OP may wish to look away now. Basically the code is doing something that appears impossible (until it's run in the debugger) and that leads to the crash.
The DASH reference in the code is obsolete, both the link and the document: this is the current standard Information technology — Dynamic adaptive streaming over HTTP (DASH) — Part 1: Media presentation description and segment formats.
According to section 5.3.2.1,
- If the attribute @start is present in the Period, then the Period is a regular Period or an early terminated Period and the PeriodStart is equal to the value of this attribute. ... ... The Period extends until the PeriodStart of the next Period, or until the end of the Media Presentation in the case of the last Period. For regular Periods, the difference between the PeriodStart time of a Period and either the PeriodStart time of the following Period, if this is not the last Period, or the value of the MPD@mediaPresentationDuration if this is the last one, is the presentation duration in Media Presentation time of the media content represented by the Representations in this Period. For Early Terminated Periods, the value of the Period@duration is the presentation duration in Media Presentation time of the media content represented by the Representations in this Period.
So this must be an
Early Terminated Period
as it has a@start
but no@duration
and it's the only one, so the last, but noMPD@mediaPresentationDuration
is present. In this case the duration is the sum of the@d
values of theS
elements in the representation which are in units of 1/SegmentTemplate@timescale
seconds, but it doesn't have to be calculated explicitly.The failure occurred because somehow the conditional branch for a
$Number$
template was taken when it was, at least if it hasn't changed, actually"1125_FOX__SPORTS_PREMIUM_ARG-$RepresentationID$-$Time$.dash"
. The branch# $Number*$ or $Time$ in media template with S list available
should have been taken, which doesn't need a duration.
Thanks for the explanation.
The site I got this link from has been making some changes lately (the links were different before, and there was no geo-blocking), maybe it has something to do with it.
Further, there are other links/manifests from that same site that can be downloaded (like this one, for example https://mega.nz/file/rv5QBICI#Mh7llA27QGCrNJsp0ujsnfvPeccykbweyWHGgpjnveA), but I don't understand why it is because there's not much difference between one and the other.
Apparently, a similar problem, NoneType now gets into progress:
@Ra10Bit, your issue looks to be separate: you should probably open a new issue with a complete (ie including the command-line dump) verbose log.
Further, there are other links/manifests from that same site that can be downloaded (like this one, for example https://mega.nz/file/rv5QBICI#Mh7llA27QGCrNJsp0ujsnfvPeccykbweyWHGgpjnveA), but I don't understand why it is because there's not much difference between one and the other.
Indeed, but the issue reproduces using the raw Gist link to the first file. yt-dl doesn't have the problem, because it's less ambitious.
The debugger shows that it's the thumbnail stream causing the problem, with media_template
set to 'thumbnails/1125_FOX__SPORTS_PREMIUM_ARG-thumbnail-%(Number)d.jpeg'
and no s
key in representation_ms_info
:
<AdaptationSet id="4" contentType="image" mimeType="image/jpeg">
<SegmentTemplate timescale="1" duration="6" startNumber="1" media="thumbnails/1125_FOX__SPORTS_PREMIUM_ARG-$RepresentationID$-$Number$.jpeg"/>
<Representation bandwidth="10000" id="thumbnail" width="204" height="120">
<EssentialProperty schemeIdUri="http://dashif.org/guidelines/thumbnail_tile" value="1x1"/>
</Representation>
</AdaptationSet>
Plainly yt-dlp can't work out the total number of thumbnails using the existing code.
The problem AdaptationSet
seems to follow this example, even if ISO/IEC 23009-1 doesn't seem to specify @duration
as an attribute of SegmentTemplate
(it probably needs to be read more thoroughly).
So it defines defines one media segment for every 6 seconds, containing a 204x120 image in a 1x1 grid and that image is displayed for those 6 seconds.
As the AdaptationSet
s are all inside the same Period
, I suppose that a period_duration
, once calculated for the Period
, should carry over to other AdaptationSet
s.
This patch, using the above tactic, fixes the problem URL, though possibly not all cases where the duration is not known from the AdaptationSet
itself:
--- old/yt-dlp/yt_dlp/extractor/common.py
+++ new/yt-dlp/yt_dlp/extractor/common.py
@@ -75,6 +75,7 @@
str_to_int,
strip_or_none,
traverse_obj,
+ try_get,
unescapeHTML,
UnsupportedError,
unified_strdate,
@@ -2966,6 +2967,10 @@
f['url'] = initialization_url
f['fragments'].append({location_key(initialization_url): initialization_url})
f['fragments'].extend(representation_ms_info['fragments'])
+ if period_duration is None:
+ period_duration = try_get(
+ representation_ms_info,
+ lambda r: sum(frag['duration'] for frag in r['fragments']), float)
else:
# Assuming direct URL to unfragmented media.
f['url'] = base_url
The video formats only appear with --allow-unplayable-formats
.
WARNING: You have asked for UNPLAYABLE formats to be listed/downloaded. This is a developer option intended for debugging.
If you experience any issues while using this option, DO NOT open a bug report
Normally I would simply close this issue. But in this specific case, there is a storyboards format in the manifest that can be extracted without the use of --allow-unplayable-formats
One last thing: according to the bitmovin player, those manifests are 4 hours long. I don't know how much help this, but I "discovered" it and wanted to share it. Regards.
Checklist
Description
I was trying to check the available video formats from a dash manifest but I got this error.
Verbose log