ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.39k stars 10.04k forks source link

cbsnews.com site changed / video extraction no longer working... defaulting to generic extractor instead. #15397

Closed wolferikg closed 5 years ago

wolferikg commented 6 years ago

Please follow the guide below


Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2018.01.21. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

bash-3.2$ youtube-dl --version 2018.01.21

Before submitting an issue make sure you have:

What is the purpose of your issue?


The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your issue


Download of CBS Evening News falls back to generic extractor and ends up grabbing the CBSN live stream instead:

$ youtube-dl https://www.cbsnews.com/video/122-cbs-evening-news/ -v [debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: [u'https://www.cbsnews.com/video/122-cbs-evening-news/', u'-v'] [debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8 [debug] youtube-dl version 2018.01.21 [debug] Python version 2.7.9 (CPython) - Linux-3.16.0-4-amd64-x86_64-with-debian-8.9 [debug] exe versions: ffmpeg 2.6.9, ffprobe 2.6.9, rtmpdump 2.4 [debug] Proxy map: {} [generic] 122-cbs-evening-news: Requesting header WARNING: Falling back on generic information extractor. [generic] 122-cbs-evening-news: Downloading webpage [generic] 122-cbs-evening-news: Extracting information [generic] 122-cbs-evening-news: Downloading m3u8 information [download] Downloading playlist: 1/22: CBS Evening News [generic] playlist 1/22: CBS Evening News: Collected 1 video ids (downloading 1 of them) [download] Downloading video 1 of 1 [debug] Default format spec: bestvideo+bestaudio/best [debug] Invoking downloader on u'https://cbsnhls-i.akamaihd.net/hls/live/264710-b/cbsn_hlsprod_2/master_360.m3u8' [download] Destination: 1_22 - CBS Evening News-122-cbs-evening-news.mp4 [debug] ffmpeg command line: ffmpeg -y -loglevel verbose -headers 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,;q=0.7 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip, deflate Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/*;q=0.8 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/47.0 (Chrome) ' -i 'https://cbsnhls-i.akamaihd.net/hls/live/264710-b/cbsn_hlsprod_2/master_360.m3u8' -c copy -f mp4 '-bsf:a' aac_adtstoasc 'file:1_22 - CBS Evening News-122-cbs-evening-news.mp4.part' ffmpeg version 2.6.9 Copyright (c) 2000-2016 the FFmpeg developers built with gcc 4.9.2 (Debian 4.9.2-10) configuration: --prefix=/usr --extra-cflags='-g -O2 -fstack-protector-strong -Wformat -Werror=format-security ' --extra-ldflags='-Wl,-z,relro' --cc='ccache cc' --enable-shared --enable-libmp3lame --enable-gpl --enable-nonfree --enable-libvorbis --enable-pthreads --enable-libfaac --enable-libxvid --enable-postproc --enable-x11grab --enable-libgsm --enable-libtheora --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libx264 --enable-libspeex --enable-nonfree --disable-stripping --enable-libvpx --enable-libschroedinger --disable-encoder=libschroedinger --enable-version3 --enable-libopenjpeg --enable-librtmp --enable-avfilter --enable-libfreetype --enable-libvo-aacenc --disable-decoder=amrnb --enable-libvo-amrwbenc --enable-libaacplus --libdir=/usr/lib/x86_64-linux-gnu --disable-vda --enable-libbluray --enable-libcdio --enable-gnutls --enable-frei0r --enable-openssl --enable-libass --enable-libopus --enable-fontconfig --enable-libpulse --disable-mips32r2 --disable-mipsdspr1 --disable-mipsdspr2 --enable-libvidstab --enable-libzvbi --enable-avresample --disable-htmlpages --disable-podpages --enable-libutvideo --enable-libfdk-aac --enable-libx265 --enable-libiec61883 --enable-vaapi --enable-libdc1394 --disable-altivec --shlibdir=/usr/lib/x86_64-linux-gnu libavutil 54. 20.100 / 54. 20.100 libavcodec 56. 26.100 / 56. 26.100 libavformat 56. 25.101 / 56. 25.101 libavdevice 56. 4.100 / 56. 4.100 libavfilter 5. 11.102 / 5. 11.102 libavresample 2. 1. 0 / 2. 1. 0 libswscale 3. 1.101 / 3. 1.101 libswresample 1. 1.100 / 1. 1.100 libpostproc 53. 3.100 / 53. 3.100 [hls,applehttp @ 0x1f277e0] HLS request for url 'https://cbsnhls-i.akamaihd.net/hls/live/264710-b/cbsn_hlsprod_2/20180123T032018/master_360/00004/master_360_01525.ts', offset 0, playlist 0 [mpegts @ 0x1f2e160] parser not found for codec none, packets or times may be invalid. [mpegts @ 0x1f2e160] parser not found for codec timed_id3, packets or times may be invalid. [h264 @ 0x23d28c0] Current profile doesn't provide more RBSP data in PPS, skipping Last message repeated 2 times [mpegts @ 0x1f2e160] max_analyze_duration 5000000 reached at 5005000 microseconds [mpegts @ 0x1f2e160] Could not find codec parameters for stream 2 (Unknown: none ([134][0][0][0] / 0x0086)): unknown codec Consider increasing the value for the 'analyzeduration' and 'probesize' options [hls,applehttp @ 0x1f277e0] max_analyze_duration 5000000 reached at 5005000 microseconds [hls,applehttp @ 0x1f277e0] Could not find codec parameters for stream 2 (Unknown: none ([134][0][0][0] / 0x0086)): unknown codec Consider increasing the value for the 'analyzeduration' and 'probesize' options Input #0, hls,applehttp, from 'https://cbsnhls-i.akamaihd.net/hls/live/264710-b/cbsn_hlsprod_2/master_360.m3u8': Duration: N/A, start: 56320.312000, bitrate: N/A Program 0 Metadata: variant_bitrate : 0 Stream #0:0: Video: h264 (Constrained Baseline) ([27][0][0][0] / 0x001B), yuv420p, 640x360 (640x368) [SAR 1:1 DAR 16:9], 29.97 fps, 29.97 tbr, 90k tbn, 59.94 tbc Stream #0:1: Audio: aac (LC) ([15][0][0][0] / 0x000F), 32000 Hz, stereo, fltp, 57 kb/s Stream #0:2: Unknown: none ([134][0][0][0] / 0x0086) Stream #0:3: Data: timed_id3 (ID3 / 0x20334449) Output #0, mp4, to 'file:1_22 - CBS Evening News-122-cbs-evening-news.mp4.part': Metadata: encoder : Lavf56.25.101 Stream #0:0: Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 640x360 (0x0) [SAR 1:1 DAR 16:9], q=2-31, 29.97 fps, 29.97 tbr, 90k tbn, 90k tbc Stream #0:1: Audio: aac ([64][0][0][0] / 0x0040), 32000 Hz, stereo, 57 kb/s Stream mapping: Stream #0:0 -> #0:0 (copy) Stream #0:1 -> #0:1 (copy) Press [q] to stop, [?] for help [hls,applehttp @ 0x1f277e0] HLS request for url 'https://cbsnhls-i.akamaihd.net/hls/live/264710-b/cbsn_hlsprod_2/20180123T032018/master_360/00004/master_360_01526.ts', offset 0, playlist 0 [NULL @ 0x23d28c0] Current profile doesn't provide more RBSP data in PPS, skipping Last message repeated 2 times [hls,applehttp @ 0x1f277e0] HLS request for url 'https://cbsnhls-i.akamaihd.net/hls/live/264710-b/cbsn_hlsprod_2/20180123T032018/master_360/00004/master_360_01527.ts', offset 0, playlist 0 [NULL @ 0x23d28c0] Current profile doesn't provide more RBSP data in PPS, skipping ^C Last message repeated 2 times [hls,applehttp @ 0x1f277e0] HLS request for url 'https://cbsnhls-i.akamaihd.net/hls/live/264710-b/cbsn_hlsprod_2/20180123T032018/master_360/00004/master_360_01528.ts', offset 0, playlist 0 ^C ERROR: Interrupted by user


Description of your issue, suggested solution and other information

Download of CBS Evening News falls back to generic extractor and ends up grabbing the CBSN live stream instead. Tested from multiple boxes.

Rick7C2 commented 6 years ago

CBSNEWS changed their videos urls from

https://www.cbsnews.com/videos/126-cbs-evening-news-2/

to

https://www.cbsnews.com/video/126-cbs-evening-news-2/

Line 14 in https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/cbsnews.py

Needs to be changed from... _VALIDURL = r'https?://(?:www.)?cbsnews.com/(?:news|videos)/(?P[\da-z-]+)'

To... _VALIDURL = r'https?://(?:www.)?cbsnews.com/(?:news|video)/(?P[\da-z-]+)'

slash-proc commented 6 years ago

I think the change on their end goes a little deeper than that. I tried exactly what you suggested and unfortunately after I make that change youtube-dl is unable to extract playlist JSON info.

Before change: youtube-dl $ ./youtube-dl http://www.cbsnews.com/video/131-cbs-evening-news/ -F -v [debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: ['http://www.cbsnews.com/video/131-cbs-evening-news/', '-F', '-v'] [debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8 [debug] youtube-dl version 2018.01.27 [debug] Python version 3.4.5 (CPython) - Linux-4.9.16-gentoo-x86_64-Intel-R-_Pentium-R-_CPUJ2900@_2.41GHz-with-gentoo-2.3 [debug] exe versions: ffmpeg N-86258-g5782e0b, ffprobe N-86258-g5782e0b, rtmpdump 2.4 [debug] Proxy map: {} [generic] 131-cbs-evening-news: Requesting header [redirect] Following redirect to https://www.cbsnews.com/video/131-cbs-evening-news/ [generic] 131-cbs-evening-news: Requesting header WARNING: Falling back on generic information extractor. [generic] 131-cbs-evening-news: Downloading webpage [generic] 131-cbs-evening-news: Extracting information [generic] 131-cbs-evening-news: Downloading m3u8 information [download] Downloading playlist: 1/31: CBS Evening News [generic] playlist 1/31: CBS Evening News: Collected 1 video ids (downloading 1 of them) [download] Downloading video 1 of 1 [info] Available formats for 131-cbs-evening-news: format code extension resolution note hls-202-0 mp4 320x180 202k , avc1.4d400d, mp4a.40.2 hls-202-1 mp4 320x180 202k , avc1.4d400d, mp4a.40.2 hls-466-0 mp4 640x360 466k , avc1.66.30, mp4a.40.2 hls-466-1 mp4 640x360 466k , avc1.66.30, mp4a.40.2 (best) [download] Finished downloading playlist: 1/31: CBS Evening News

After: youtube-dl $ ./youtube-dl http://www.cbsnews.com/video/131-cbs-evening-news/ -F -v [debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: ['http://www.cbsnews.com/video/131-cbs-evening-news/', '-F', '-v'] [debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8 [debug] youtube-dl version 2018.01.27 [debug] Python version 3.4.5 (CPython) - Linux-4.9.16-gentoo-x86_64-Intel-R-_Pentium-R-_CPUJ2900@_2.41GHz-with-gentoo-2.3 [debug] exe versions: ffmpeg N-86258-g5782e0b, ffprobe N-86258-g5782e0b, rtmpdump 2.4 [debug] Proxy map: {} [cbsnews] 131-cbs-evening-news: Downloading webpage ERROR: Unable to extract playlist JSON info; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. Traceback (most recent call last): File "./youtube-dl/youtube_dl/YoutubeDL.py", line 784, in extract_info ie_result = ie.extract(url) File "./youtube-dl/youtube_dl/extractor/common.py", line 438, in extract ie_result = self._real_extract(url) File "./youtube-dl/youtube_dl/extractor/cbsnews.py", line 91, in _real_extract 'playlist JSON info', group='json'), video_id)['state'] File "./youtube-dl/youtube_dl/extractor/common.py", line 794, in _search_regex raise RegexNotFoundError('Unable to extract %s' % _name) youtube_dl.utils.RegexNotFoundError: Unable to extract playlist JSON info; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

cfxd commented 6 years ago

I'm running into this too, even with the 2018.03.10 version :-/

I actually found that if you visit the video's page and grab the API URL from CBSNEWS.defaultPayload.items.video and use that URL in your command line then it works and grabs the vid πŸ‘

wolferikg commented 6 years ago

Thanks for the hint @cfxd ... i finally went ahead and whipped up a quick script, that seems to work pretty well. One needs to install 'jq' for this to work (https://stedolan.github.io/jq/).

wolf$ cat cbsnews.sh
#!/bin/bash

usage () {
echo "$(basename $0) Usage:"
echo "$(basename $0) <URL> [-d]"
echo "    -d // dry run: print video-url and exit."
echo ""
exit 2
}

if [ $# -lt 1 ] ;then usage ;fi

episode=$1
baseurl='https://www.cbsnews.com'
output=$(echo $episode | awk -F/ '{print $5".mp4"}')

json=$(curl -s $episode | grep CBSNEWS.defaultPayload | head -1 | awk -F' = ' '{print $2}')
video=$(echo $json | jq '.items|.[0].video' | sed 's/_phone.m3u8/_tablet.m3u8/g' | sed 's/"//g')

videourl=$baseurl$video

if [ $2 = "-d" ]
  then
    echo "Video-URL: $videourl"
  else
    echo "Attempting to download $videourl to $output ..."
    youtube-dl -o $output $videourl
fi

The substitution of _phone.m3u8 with _tablet.m3u8 was a "wild" guess πŸ˜†and will pull the high res version πŸ˜‰

Maybe someone with more programming skills can use this as a base to submit a patch to fix this issue directly in the yt-dl cbsnews extractor?

Cheers.

Wowfunhappy commented 6 years ago

Six months later, this is still broken as of version 2018.07.21.

vxbinaca commented 6 years ago

I can confirm @wolferikg clever hack works well. Good job you just helped me with something. Much appreciated.

ddurdle commented 5 years ago

Great workaround. My previous workaround of opening the page source and looking for the 740.mp4 link doesn't work anymore, but this seems to.

ddurdle commented 5 years ago

dead again

ddurdle commented 5 years ago

looks like the change script is appending cbsnews.com to the url when it already exists on the url, so taking the error message and passing it through youtube-dl manually works

sheerluck commented 5 years ago

Hi all, I tried to download https://www.cbsnews.com/news/how-the-danske-bank-money-laundering-scheme-involving-230-billion-unraveled-60-minutes-2019-05-19 and failed. Opened DevTools and have spotted a sequence of "akamaihd" urls like https://devicecbsnews-a.akamaihd.net/media/mpx/2019/05/19/1524617283782/0519_60Minutes_Segment1_1853572_1200/0519_60Minutes_Segment1_1853572_1200_14.ts I can see a 7 or 8 extractors already know about "akamaihd" (francetv, lego, brightcove, senateisvp, livestream, nba, nhk, tvnow) so maybe we can fix cbsnews same way.