ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.31k stars 9.95k forks source link

cbsnews.com broken for weeks #32011

Open Tetracerus opened 1 year ago

Tetracerus commented 1 year ago

Verbose log

$ youtube-dl -v -F https://www.cbsnews.com/video/sunday-morning-full-episode-4-2-2023/
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', '-F', 'https://www.cbsnews.com/video/sunday-morning-full-episode-4-2-2023/']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.6.8 (CPython) - Linux-4.18.0-425.13.1.el8_7.x86_64-x86_64-with-centos-8.7-Green_Obsidian
[debug] exe versions: ffmpeg 4.2.8, ffprobe 4.2.8
[debug] Proxy map: {}
[cbsnews] sunday-morning-full-episode-4-2-2023: Downloading webpage
[cbsnews] A0u8tIz70VHrE_UwZAjPeMOAa54TH4T5: Downloading XML
ERROR: A0u8tIz70VHrE_UwZAjPeMOAa54TH4T5: Failed to parse XML  (caused by ParseError('no element found: line 1, column 0',)); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/usr/local/bin/youtube_dl/extractor/common.py", line 857, in _parse_xml
    return compat_etree_fromstring(xml_string.encode('utf-8'))
  File "/usr/local/bin/youtube_dl/compat.py", line 2611, in compat_etree_fromstring
    return etree.XML(text, parser=etree.XMLParser(target=_TreeBuilder()))
  File "/usr/lib64/python3.6/xml/etree/ElementTree.py", line 1315, in XML
    return parser.close()
  File "<string>", line None
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
Traceback (most recent call last):
  File "/usr/local/bin/youtube_dl/extractor/common.py", line 857, in _parse_xml
    return compat_etree_fromstring(xml_string.encode('utf-8'))
  File "/usr/local/bin/youtube_dl/compat.py", line 2611, in compat_etree_fromstring
    return etree.XML(text, parser=etree.XMLParser(target=_TreeBuilder()))
  File "/usr/lib64/python3.6/xml/etree/ElementTree.py", line 1315, in XML
    return parser.close()
  File "<string>", line None
xml.etree.ElementTree.ParseError: no element found: line 1, column 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/youtube_dl/YoutubeDL.py", line 818, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/bin/youtube_dl/YoutubeDL.py", line 839, in __extract_info
    ie_result = ie.extract(url)
  File "/usr/local/bin/youtube_dl/extractor/common.py", line 535, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/bin/youtube_dl/extractor/cbsnews.py", line 108, in _real_extract
    return self._extract_video_info(item['mpxRefId'], 'cbsnews')
  File "/usr/local/bin/youtube_dl/extractor/cbs.py", line 63, in _extract_video_info
    content_id, query={'partner': site, 'contentId': content_id})
  File "/usr/local/bin/youtube_dl/extractor/common.py", line 850, in _download_xml
    expected_status=expected_status)
  File "/usr/local/bin/youtube_dl/extractor/common.py", line 834, in _download_xml_handle
    fatal=fatal), urlh
  File "/usr/local/bin/youtube_dl/extractor/common.py", line 861, in _parse_xml
    raise ExtractorError(errmsg, cause=ve)
youtube_dl.utils.ExtractorError: A0u8tIz70VHrE_UwZAjPeMOAa54TH4T5: Failed to parse XML  (caused by ParseError('no element found: line 1, column 0',)); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Description

cbsnews.com videos stopped working a few weeks ago.

I just grabbed the latest snapshot of master minutes ago and it's still not working.

For this video: https://www.cbsnews.com/video/sunday-morning-full-episode-4-2-2023/

We just need to parse index.html to find the URL to master.m3u8 file. Snippet:

Sunday Morning Full Episode 4/2","timestamp":1680440400000,"duration":3760,"durationLabel":"01:02:40","label":null,"images":{"sd":"https://assets1.cbsnewsstatic.com/hub/i/r/2023/04/02/a457a56d-498d-420b-9ba1-d91fab2ee07b/thumbnail/640x360/6bd395cc9a6be06149ad04014c1d9296/smjanepauley040223-1848688-640x360.jpg","hd":"https://assets3.cbsnewsstatic.com/hub/i/r/2023/04/02/a457a56d-498d-420b-9ba1-d91fab2ee07b/thumbnail/1280x720/58ec9371cb82766448dfed42a78b2182/smjanepauley040223-1848688-640x360.jpg"},"previewUrl":"https://splice.amlg.io/api/v2/video/A0u8tIz70VHrE_UwZAjPeMOAa54TH4T5/preview/","video":"https://prod.vodvideo.cbsnews.com/cbsnews/vr/hls/2023/03/22/2185212995641/1848685_hls/master.m3u8","video2":"https://prod.vodvideo.cbsnews.com/cbsnews/vr/hls/2023/03/22/2185212995641/1848685_hls/master.m3u8","format":"application/x-mpegURL","url":"https://www.cbsnews.com/video/sunday-morning-full-episode-4-2-2023/

If we fetch https://prod.vodvideo.cbsnews.com/cbsnews/vr/hls/2023/03/22/2185212995641/1848685_hls/master.m3u8 - it contains metadata on all the video and audio formats. Snippet:

#EXT-X-STREAM-INF:BANDWIDTH=634827,AVERAGE-BANDWIDTH=516066,CODECS="avc1.640028,mp4a.40.2",RESOLUTION=640x360,FRAME-RATE=29.970,VIDEO-RANGE=SDR,AUDIO="audio_aac",CLOSED-CAPTIONS=NONE 0402_SUNMO_FULL_1_1848685_375/stream.m3u8

That SUNMO_FULL.../stream.m3u8 is the video playlist:

Video/audio links (VLC can play) are: AAC Audio: https://prod.vodvideo.cbsnews.com/cbsnews/vr/hls/2023/03/22/2185212995641/1848685_hls/0402_SUNMO_FULL_1_en-US_1848685_aac_128/stream.m3u8 640x360: https://prod.vodvideo.cbsnews.com/cbsnews/vr/hls/2023/03/22/2185212995641/1848685_hls/0402_SUNMO_FULL_1_1848685_375/stream.m3u8 768x432: https://prod.vodvideo.cbsnews.com/cbsnews/vr/hls/2023/03/22/2185212995641/1848685_hls/0402_SUNMO_FULL_1_1848685_750/stream.m3u8 960x540: https://prod.vodvideo.cbsnews.com/cbsnews/vr/hls/2023/03/22/2185212995641/1848685_hls/0402_SUNMO_FULL_1_1848685_1500/stream.m3u8 1280x720: https://prod.vodvideo.cbsnews.com/cbsnews/vr/hls/2023/03/22/2185212995641/1848685_hls/0402_SUNMO_FULL_1_1848685_2100/stream.m3u8 HLS 1920x1080: https://prod.vodvideo.cbsnews.com/cbsnews/vr/hls/2023/03/22/2185212995641/1848685_hls/0402_SUNMO_FULL_1_1848685_3000/stream.m3u8

dirkf commented 1 year ago

This bug is already addressed in https://github.com/yt-dlp/yt-dlp/issues/6565 and can be fixed by back-porting the extractor changes from https://github.com/yt-dlp/yt-dlp/pull/6681.