yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader
https://discord.gg/H5MNcFW63r
The Unlicense
76.08k stars 5.98k forks source link

CBC News Videos Extractor Not Working: "Unable to download XML: HTTP Error 404: Not Found" #10170

Open LifesGottaBeFun opened 2 weeks ago

LifesGottaBeFun commented 2 weeks ago

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

Checklist

Region

Non-Geoblocked

Provide a description that is worded well enough to be understood

I tried to download this video: https://www.cbc.ca/player/play/video/9.6420651

However, it failed and gave me the "Unable to download XML: HTTP Error 404: Not Found" error.

Provide verbose output that clearly demonstrates the problem

Complete Verbose Output

[debug] Command-line config: ['https://www.cbc.ca/player/play/video/9.6420651', '-o', 'D:\\Downloaded Audio-Video Tracks\\ViaYouTubeDL\\cbc.ca\\Custom\\%(title)s-%(id)s.%(ext)s', '-o', 'D:/EdmontonAirMonitoring.mp4', '-vU']
[debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out cp1252 (No VT), error cp1252 (No VT), screen cp1252 (No VT)
[debug] yt-dlp version stable@2024.05.27 from yt-dlp/yt-dlp [12b248ce6] (win_exe)
[debug] Python 3.8.10 (CPython AMD64 64bit) - Windows-10-10.0.22621-SP0 (OpenSSL 1.1.1k  25 Mar 2021)
[debug] exe versions: ffmpeg 6.0-essentials_build-www.gyan.dev (setts), ffprobe 6.0-essentials_build-www.gyan.dev
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.02.02, curl_cffi-0.5.10, mutagen-1.47.0, requests-2.32.2, sqlite3-3.35.5, urllib3-2.2.1, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets, curl_cffi
[debug] Loaded 1820 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: stable@2024.05.27 from yt-dlp/yt-dlp
yt-dlp is up to date (stable@2024.05.27 from yt-dlp/yt-dlp)
[cbc.ca:player] Extracting URL: https://www.cbc.ca/player/play/video/9.6420651
[cbc.ca:player] 9.6420651: Downloading webpage
[ThePlatform] Extracting URL: http://link.theplatform.com/s/ExhSPC/media/guid/2655402169/None?mbr=true&formats=MPEG4,FLV,MP3#__youtubedl_smuggle=%7B%22force_smil_url%22%3A+true%7D
[ThePlatform] None: Downloading SMIL data
[ThePlatform] None: Unable to download XML: HTTP Error 404: Not Found (caused by <HTTPError 404: Not Found>)
  File "yt_dlp\extractor\common.py", line 734, in extract
  File "yt_dlp\extractor\theplatform.py", line 313, in _real_extract
  File "yt_dlp\extractor\theplatform.py", line 34, in _extract_theplatform_smil
  File "yt_dlp\extractor\common.py", line 1133, in download_content
  File "yt_dlp\extractor\common.py", line 1093, in download_handle
  File "yt_dlp\extractor\adobepass.py", line 1366, in _download_webpage_handle
  File "yt_dlp\extractor\common.py", line 954, in _download_webpage_handle
  File "yt_dlp\extractor\common.py", line 903, in _request_webpage
  File "yt_dlp\extractor\common.py", line 890, in _request_webpage
  File "yt_dlp\YoutubeDL.py", line 4142, in urlopen
  File "yt_dlp\networking\common.py", line 117, in send
  File "yt_dlp\networking\_helper.py", line 208, in wrapper
  File "yt_dlp\networking\common.py", line 337, in send
  File "yt_dlp\networking\_requests.py", line 366, in _send
yt_dlp.networking.exceptions.HTTPHTTP Error 404: Not Found
An error occured
trainman261 commented 2 weeks ago

I've noticed the same problem. It seems like #9534 was a precursor to this. As far as I can tell, there is no MediaID key anymore, which was what was being used to get the video files from ThePlatform. Looking through how the site works now, I can't find any reference to ThePlatform anymore (although I am a bit of a noob at this, so feel free to tell me I'm wrong). What definitely works (tried manually successfully) is:

I've also found that the whole TS, MP4 as well as VTT files are directly accessible by analyzing the traffic and (for MP4s) messing around with the URLs pulled. In the meantime I've found that the direct link to the VTT file can be extracted from the first block of JSON, but I'm still looking to find a solid pattern as to the TS and mp4 files.

The first option is the most straight forward, but works via HLS and tends to download ~30 files per minute of video (~45 if you add subtitles), meaning ~2000 files for a 45 minute video with subtitles. The second option would be a nice addition, but somewhat more complex.

I'll try to convert the first option into code within the coming week - but if someone else gets around to it sooner feel free and go ahead.

trainman261 commented 1 week ago

Update: I've gotten around to implementing a rudimentary solution, I've pushed it to a branch on my dev fork. It works on my end and if someone needs a stopgap, feel free to use it until I polish it up and submit a PR.