Open flashdagger opened 1 year ago
This MPD is using an Initialization
element that does not include a sourceURL
attribute. It only includes a range
attribute that refers to a higher-level BaseURL
. yt-dlp is assuming that sourceURL
is always present.
BTW, dash-mpd-cli downloads this content fine.
Same issue as the underlying issue of #5288, though that site has apparently changed and it may not be useful for continuing to track the MPD sourceURL
problem, so keeping this open
In https://github.com/ytdl-org/youtube-dl/issues/32595#issuecomment-1761209532, I back-ported yt-dlp's _parse_mpd_formats_and subtitles()
and modified it to address this issue.
The old code instantiated a BaseURL
at the representation
level by merging BaseURL
s up the XML hierarchy and finally adding default URL components from the mpd_base_url
, but didn't use any default for media URL attributes.
My approach was to pull out the BaseURL
processing so that as the hierarchy is descended whatever BaseURL
has been constructed so far can be passed, if it isn't a partial path, with key base_url
in the parent info, and then used as a default for any missing media URLs.
There may be better ways. This sort of DASH format may even be invalid. But this is what happens with OP's link:
$ python -m youtube_dl -v -F 'https://video.brighteon.com/file/BTBucket-Prod/dash/b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a.mpd'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://video.brighteon.com/file/BTBucket-Prod/dash/b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a.mpd']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 66ab0814c
[debug] Python 2.7.18 (CPython i686 32bit) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial - OpenSSL 1.1.1w 11 Sep 2023 - glibc 2.15
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[generic] b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a: Requesting header
WARNING: Falling back on generic information extractor.
[generic] b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a: Downloading webpage
[generic] b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a: Extracting information
[info] Available formats for b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a:
format code extension resolution note
1 m4a audio only [eng] DASH audio 0k , m4a_dash container, mp4a.40.2 (44100Hz)
0 mp4 480x270 [eng] DASH video 300k , mp4_dash container, avc1.640015, video only
2 mp4 960x540 [eng] DASH video 600k , mp4_dash container, avc1.64001f, video only (best)
$
has this been added to the code? if I build from source are your changes included?
In ytdl-org/youtube-dl#32595 (comment), I back-ported yt-dlp's
_parse_mpd_formats_and subtitles()
and modified it to address this issue.The old code instantiated a
BaseURL
at therepresentation
level by mergingBaseURL
s up the XML hierarchy and finally adding default URL components from thempd_base_url
, but didn't use any default for media URL attributes.My approach was to pull out the
BaseURL
processing so that as the hierarchy is descended whateverBaseURL
has been constructed so far can be passed, if it isn't a partial path, with keybase_url
in the parent info, and then used as a default for any missing media URLs.There may be better ways. This sort of DASH format may even be invalid. But this is what happens with OP's link:
$ python -m youtube_dl -v -F 'https://video.brighteon.com/file/BTBucket-Prod/dash/b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a.mpd' [debug] System config: [u'--prefer-ffmpeg'] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: [u'-v', u'-F', u'https://video.brighteon.com/file/BTBucket-Prod/dash/b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a.mpd'] [debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8 [debug] youtube-dl version 2021.12.17 [debug] Git HEAD: 66ab0814c [debug] Python 2.7.18 (CPython i686 32bit) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial - OpenSSL 1.1.1w 11 Sep 2023 - glibc 2.15 [debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3 [debug] Proxy map: {} [generic] b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a: Requesting header WARNING: Falling back on generic information extractor. [generic] b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a: Downloading webpage [generic] b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a: Extracting information [info] Available formats for b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a: format code extension resolution note 1 m4a audio only [eng] DASH audio 0k , m4a_dash container, mp4a.40.2 (44100Hz) 0 mp4 480x270 [eng] DASH video 300k , mp4_dash container, avc1.640015, video only 2 mp4 960x540 [eng] DASH video 600k , mp4_dash container, avc1.64001f, video only (best) $
Not even in a PR at yt-dl yet, let alone here.
damn, okay... please let me know if it does become a thing...
(or if you happen to have a linux version I can test with your changes?)
thank you!
li'l noobish here very 1st - is #8959 closed? You said submit a ticket; I did. It was revoked? If so, should I close account and stop sending these?
@miscellaneous01 it was a duplicate of this issue. There is no need for 2 reports to track 1 bug
I couldn't find a way to search for tickets. github doesn't supply a yt-dlp range search, right?
@miscellaneous01 github has a search function but it's not very good. Don't worry about it, happens all the time
In the duplicate issues #8655, #8959, #9012, the problem URLs from brighteon.com appear to generate a 2-item playlist, where the first item has A-V and matching video-only formats, plus an audio-only format, and the second is just mp3. Is that expected?
The brighteon plugin usually presents something like this:
ID EXT RESOLUTION FPS │ FILESIZE TBR PROTO │ VCODEC VBR ACODEC ABR ASR MORE INFO
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
dash-audio m4a audio only │ ~ 39.77MiB 96k https │ audio only mp4a.40.5 96k 48k DASH audio, m4a_dash
audio mp3 audio only │ ≈ 79.55MiB 192k https │ audio only mp4a.40.2 192k 48k
dash-270p mp4 480x270 │ ~145.01MiB 350k https │ avc1.4d401f 350k video only DASH video, mp4_dash
hls-270p mp4 480x270 15 │ ~103.01MiB 249k m3u8 │ avc1.4d401f mp4a.40.5
dash-540p mp4 960x540 │ ~621.46MiB 1500k https │ avc1.640028 1500k video only DASH video, mp4_dash
hls-540p mp4 960x540 30 │ ~279.05MiB 674k m3u8 │ avc1.4d401f mp4a.40.5
Where the Dash streams (1 audio + 2 or 3 video) comes from the MPD manifest. HLS streams are from m3u8 and the mp3 audio is a separate file.
I don't think that plugin did anything special about this issue, it just skips problematic MPDs.
As for these these dash formats it does return, maybe it just extracted them from another non-problematic MPDs?
Edit: wait, that's your plugin! Then I have no idea what you meant.
Then I suppose that the extraction as a 2-item playlist is an artefact of the upstream generic extractor.
I don't think that plugin did anything special about this issue, it just skips problematic MPDs.
As for these these dash formats it does return, maybe it just extracted them from another non-problematic MPDs?
Edit: wait, that's your plugin! Then I have no idea what you meant.
I just described which formats should be expected when the MPD is finally parsed. My plugin does not try to solve the issue, as MPD-parsing is a yt-dlp core functionality.
@dirkf: If you use the generic extractor then you also get all formats. Just that the mp3 is an additional playlist item, but that's how the extractor works, I suppose...
See https://github.com/ytdl-org/youtube-dl/pull/32710:
$ python -m youtube_dl -vF 'https://video.brighteon.com/file/BTBucket-Prod/dash/b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a.mpd'[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-vF', u'https://video.brighteon.com/file/BTBucket-Prod/dash/b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a.mpd']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 630da9eb7
[debug] Python 2.7.18 (CPython i686 32bit) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial - OpenSSL 1.1.1w 11 Sep 2023 - glibc 2.15
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[generic] b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a: Requesting header
WARNING: Falling back on generic information extractor.
[generic] b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a: Downloading webpage
[generic] b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a: Extracting information
[info] Available formats for b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a:
format code extension resolution note
1 m4a audio only [eng] DASH audio 0k , m4a_dash container, mp4a.40.2 (44100Hz)
0 mp4 480x270 [eng] DASH video 300k , mp4_dash container, avc1.640015, video only
2 mp4 960x540 [eng] DASH video 600k , mp4_dash container, avc1.64001f, video only (best)
$
[username@host downloads]$ yt-dlp https://www.brighteon.com/9f2be9d4-1600-4002-a836-f3605746d3cc
[generic] Extracting URL: https://www.brighteon.com/9f2be9d4-1600-4002-a836-f3605746d3cc
[generic] 9f2be9d4-1600-4002-a836-f3605746d3cc: Downloading webpage
WARNING: [generic] Falling back on generic information extractor
[generic] 9f2be9d4-1600-4002-a836-f3605746d3cc: Extracting information
[html5] 9f2be9d4-1600-4002-a836-f3605746d3cc: Downloading m3u8 information
[html5] 9f2be9d4-1600-4002-a836-f3605746d3cc: Downloading MPD manifest
ERROR: An extractor error has occurred. (caused by KeyError('sourceURL')); please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U
[username@host downloads]$ yt-dlp -U
Latest version: stable@2024.05.27 from yt-dlp/yt-dlp
yt-dlp is up to date (stable@2024.05.27 from yt-dlp/yt-dlp)
DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
Checklist
Provide a description that is worded well enough to be understood
Disclaimer: I checked all the boxes to advance in the process.
Dear developers and maintainers,
I have no idea, if the MPD file conforms to the standard. Downloading it with ffmpeg also fails, but maybe due to missing Header attributes. Please decide for yourself, if the MPD parsing needs to be changed or maybe you can tell me, if this particular format is too anomalous.
Best regards Marcel
Provide verbose output that clearly demonstrates the problem
yt-dlp -vU <your command line>
)'verbose': True
toYoutubeDL
params instead[debug] Command-line config
) and insert it belowComplete Verbose Output