ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.38k stars 9.96k forks source link

Am I reading correctly that updated youtube-dl still can't download videos from bbc.com's web site? #27125

Closed antdude closed 3 years ago

antdude commented 3 years ago

Checklist

Question

WRITE QUESTION HERE Am I reading correctly that updated youtube-dl still can't download videos from bbc.com's web site? https://github.com/ytdl-org/youtube-dl/issues?q=is%3Aissue+is%3Aopen+bbc.com shows https://github.com/ytdl-org/youtube-dl/issues/23232. Results seem to be different as shown below:

$ youtube-dl https://www.bbc.com/reel/video/p08yxrlb/why-our-dreams-could-be-the-key-to-time-travel [bbc] why-our-dreams-could-be-the-key-to-time-travel: Downloading webpage ERROR: no suitable InfoExtractor for URL https://www.bbc.co.uk/programmes/None

Or is this a different issue that I need to report as a new bug issue?

Thank you for reading and hopefully answering soon. :)

october262 commented 3 years ago

i just used the Firefox addon called the stream detector to successfully grab the master m3u8 file and downloaded this video - https://www.bbc.com/reel/video/p08yxrlb/why-our-dreams-could-be-the-key-to-time-travel

hairycactus commented 3 years ago

youtube-dl broke for bbc.com & bbc.co.uk videos as early as v2019.11.28 onwards. ie. back in Nov 2019. (Yeah, I was taking notes for every version until 2020 Q1 when I gave up hoping it would be fixed.)

It was also broken for some audio at bbc.co.uk/sounds, but the latest v2020.11.21.1 now seems to work okay for that domain, although I haven't tested every URL.

For BBC Reel (but not non-Reel) videos, previously one could work around the no suitable InfoExtractor error by specifying the Programme ID (PID) instead -- or at least until sometime in early 2020 (still okay in Jan/Feb 2020).

Eg. For https://www.bbc.com/reel/video/p08yxrlb/why-our-dreams-could-be-the-key-to-time-travel

And youtube-dl https://www.bbc.co.uk/programmes/p08yxrlb would have been able to fetch the video (back in Jan/Feb 2020 & earlier). However, with v2020.11.21.1, it now shows ERROR: No video formats found.

I also tried youtube-dl https://www.bbc.com/programmes/p08yxrlb -- but it shows ERROR: no suitable InfoExtractor for URL https://www.bbc.co.uk/programmes/None.

As such, the latest youtube-dl is totally broken for all BBC videos, unless perhaps one resorts to using 3rd-party manual extraction methods.

Vangelis66 commented 3 years ago

@hairycactus :

https://github.com/ytdl-org/youtube-dl/blob/01c92973ddebe6429f5a03855e41c412889c96dc/youtube_dl/extractor/bbc.py#L52-L58

As you say, for

https://www.bbc.com/reel/video/p08yxrlb/why-our-dreams-could-be-the-key-to-time-travel

you'd have to manipulate it to

https://www.bbc.co.uk/programmes/p08yxrlb

for the bbc.co.uk InfoExtractor (IE) to recognise it...

For pid=p08yxrlb (included in the clip's URI), yt-dl correctly retrieves that vpid=p08yxrld, as can be seen by

youtube-dl -F "https://www.bbc.co.uk/programmes/p08yxrlb" -v =>

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-F', 'https://www.bbc.co.uk/programmes/p08yxrlb', '
-v']
[debug] Encodings: locale cp1253, fs mbcs, out cp737, pref cp1253
[debug] youtube-dl version 2020.11.24
[debug] Python version 3.4.4 (CPython) - Windows-Vista-6.0.6003-SP2
[debug] exe versions: ffmpeg N-97309-g4e0cf81b49, ffprobe N-97309-g4e0cf81b49, p
hantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[bbc.co.uk] p08yxrlb: Downloading video page
[bbc.co.uk] p08yxrld: Downloading media selection XML
ERROR: No video formats found; please report this issue on https://yt-dl.org/bug
 . Make sure you are using the latest version; type  youtube-dl -U  to update. B
e sure to call youtube-dl with the --verbose flag and include its complete outpu
t.
<redacted>

However, as instructed by the code referenced above, that vpid string is only tried with the first mediaselector URI, the one with mediaset=iptv-all:

https://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/p08yxrld

which doesn't yield any media streams info (only subs/captions info) 😭 ; however, and this is a yt-dl bug in this case, the vpid string isn't tried with the second mediaselector URI (mediaset=pc), which is actually the one that does return media streams info:

https://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/pc/vpid/p08yxrld

But BBC Reel video-clips constitute edge cases for the bbcIE: They are (usually) globally available (non-geofenced), served from the bbc.com domain, which the bbcIE does not officially support; bbcIE focuses mainly on video content from BBC iPlayer (geofenced) and audio content from BBC Sounds (partly geofenced, overseas locations are served lower bitrates), not random bbc.co* clips...

Workaround: Unfortunately, I don't "speak" Python, so can not offer a PR to fix this... Should you wish to fetch above BBC Reel video, you could comment out line 56 of provided code snippet inside bbc.py

#        'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/%s',

recompile yt-dl (or invoke directly from source) and issue: youtube-dl "https://www.bbc.co.uk/programmes/p08yxrlb" =>

[bbc.co.uk] p08yxrlb: Downloading video page
[bbc.co.uk] p08yxrld: Downloading media selection XML
[bbc.co.uk] p08yxrld: Downloading MPD manifest
[bbc.co.uk] p08yxrld: Downloading MPD manifest
[bbc.co.uk] p08yxrld: Downloading MPD manifest
[bbc.co.uk] p08yxrld: Downloading MPD manifest
[dashsegments] Total fragments: 103
[download] Destination: BBC - Could your dreams predict the future-p08yxrld.fstr
eam-nonuk-pc_streaming_concrete_combined_sd_mf_limelight_world_dash_https-video=
5070000.mp4
[download]  11.1% of ~198.10MiB at 816.44KiB/s ETA 04:08
Vangelis66 commented 3 years ago

Another workaround would be to move away completely from the deprecated mediaselector/5 API and change to the current mediaselector/6 one; however, v6 produces, by default, JSON-formatted content, while the existing parser inside bbc.py expects XML-formatted one; you can still force request compatible XML-formatted response by appending /format/xml:

-        'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/%s', 
-        'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/pc/vpid/%s',
+        'http://open.live.bbc.co.uk/mediaselector/6/select/version/2.0/mediaset/iptv-all/vpid/%s/format/xml',
+        'http://open.live.bbc.co.uk/mediaselector/6/select/version/2.0/mediaset/pc/vpid/%s/format/xml',
[bbc.co.uk] p08yxrlb: Downloading video page
[bbc.co.uk] p08yxrld: Downloading media selection XML
[bbc.co.uk] p08yxrld: Downloading m3u8 information
[bbc.co.uk] p08yxrld: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
[bbc.co.uk] p08yxrld: Downloading m3u8 information
[bbc.co.uk] p08yxrld: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
[bbc.co.uk] p08yxrld: Downloading MPD manifest
[bbc.co.uk] p08yxrld: Downloading MPD manifest
[bbc.co.uk] p08yxrld: Downloading m3u8 information
[bbc.co.uk] p08yxrld: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
[bbc.co.uk] p08yxrld: Downloading m3u8 information
[bbc.co.uk] p08yxrld: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
[bbc.co.uk] p08yxrld: Downloading MPD manifest
[bbc.co.uk] p08yxrld: Downloading MPD manifest
[dashsegments] Total fragments: 103
[download] Destination: BBC - Could your dreams predict the future-p08yxrld.f_de
precated__mf_limelight-video=5070000-1.mp4
[download]   2.2% of ~171.52MiB at 960.82KiB/s ETA 04:04
ajj8 commented 3 years ago

This has been fixed for AGES by my pull request (almost a year now) which the youtube-dl maintenance team is refusing to merge https://github.com/ytdl-org/youtube-dl/pull/23415

dirkf commented 3 years ago

Fixed in https://github.com/ytdl-org/youtube-dl/commit/e465b25c1fb0e72b97a032220399d4a959662095