ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.33k stars 9.95k forks source link

CWTV show does not download #30662

Closed octavioj closed 2 years ago

octavioj commented 2 years ago

Checklist

Verbose log

PASTE VERBOSE LOG HERE

//www.cwtv.com/shows/4400/present-is-prologue/?play=6cc4708a-9b9e-45e2-ada4-468355f6cb38 ; do downepisode --verbose $f ; done [debug] System config: [] [debug] User config: [] [debug] Custom config: [] [debug] Command-line args: ['-o', '%(series)s - S%(season_number)sE%(episode_number)s - %(episode)s.%(ext)s', '--verbose', 'https://www.cwtv.com/shows/4400/group-efforts/?play=deec61a8-e0a1-4c01-8906-4e0b363350d5'] [debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8 [debug] youtube-dl version 2021.12.17 [debug] Python version 3.9.7 (CPython) - Linux-5.13.0-28-generic-x86_64-with-glibc2.34 [debug] exe versions: ffmpeg 4.4, ffprobe 4.4, rtmpdump 2.4 [debug] Proxy map: {} [CWTV] deec61a8-e0a1-4c01-8906-4e0b363350d5: Downloading JSON metadata ERROR: Unable to download JSON metadata: HTTP Error 403: Forbidden (caused by <HTTPError 403: 'Forbidden'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. File "/usr/local/lib/python3.9/dist-packages/youtube_dl/extractor/common.py", line 634, in _request_webpage return self._downloader.urlopen(url_or_request) File "/usr/local/lib/python3.9/dist-packages/youtube_dl/YoutubeDL.py", line 2288, in urlopen return self._opener.open(req, timeout=self._socket_timeout) File "/usr/lib/python3.9/urllib/request.py", line 523, in open response = meth(req, response) File "/usr/lib/python3.9/urllib/request.py", line 632, in http_response response = self.parent.error( File "/usr/lib/python3.9/urllib/request.py", line 561, in error return self._call_chain(args) File "/usr/lib/python3.9/urllib/request.py", line 494, in _call_chain result = func(args) File "/usr/lib/python3.9/urllib/request.py", line 641, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp)

Description

WRITE DESCRIPTION HERE

Trying to download the latest two episodes of 4400 from CW but I get an error 403, forbidden. When playing on the browser everything works well. Downloading the site with wget or curl works well from the command line so the sites do reply.

Vangelis66 commented 2 years ago

The show's URI,

https://www.cwtv.com/shows/4400/past-is-prologue/?play=dead9843-33b1-4201-adf8-310692fe147f

as well as related APIs/stream CDNs are geoblocked for non-US IPs; flags like --geo-bypass & --geo-bypass-country US also DON'T work to grant any access to offshore visitors... 😞 I'll assume you're inside the US, with a whitelisted IP address...

Trying here initially with a US SSH tunnel, I'm getting the output below:

youtube-dl --proxy 127.0.0.1:1080 -F "https://www.cwtv.com/shows/4400/past-is-prologue/?play=dead9843-33b1-4201-adf8-310692fe147f" => 

[CWTV] dead9843-33b1-4201-adf8-310692fe147f: Downloading JSON metadata
[ThePlatform] dead9843-33b1-4201-adf8-310692fe147f: Downloading SMIL data
[ThePlatform] dead9843-33b1-4201-adf8-310692fe147f: Downloading m3u8 information
[ThePlatform] dead9843-33b1-4201-adf8-310692fe147f: Downloading m3u8 information
[ThePlatform] dead9843-33b1-4201-adf8-310692fe147f: Downloading JSON metadata
[info] Available formats for dead9843-33b1-4201-adf8-310692fe147f:
format code  extension  resolution note
hls-261-0    mp4        416x234     261k , avc1.66.30, 23.976fps, mp4a.40.5
hls-261-1    mp4        416x234     261k , avc1.66.30, 23.976fps, mp4a.40.5
hls-261-2    mp4        416x234     261k , avc1.66.30, 23.976fps, mp4a.40.5
hls-498-0    mp4        480x270     498k , avc1.66.30, 23.976fps, mp4a.40.5
hls-498-1    mp4        480x270     498k , avc1.66.30, 23.976fps, mp4a.40.5
hls-498-2    mp4        480x270     498k , avc1.66.30, 23.976fps, mp4a.40.5
hls-704-0    mp4        640x360     704k , avc1.66.30, 23.976fps, mp4a.40.2
hls-704-1    mp4        640x360     704k , avc1.66.30, 23.976fps, mp4a.40.2
hls-704-2    mp4        640x360     704k , avc1.66.30, 23.976fps, mp4a.40.2
hls-1350-0   mp4        960x540    1350k , avc1.4d401f, 23.976fps, mp4a.40.2
hls-1350-1   mp4        960x540    1350k , avc1.4d401f, 23.976fps, mp4a.40.2
hls-1350-2   mp4        960x540    1350k , avc1.4d401f, 23.976fps, mp4a.40.2
hls-1963-0   mp4        1280x720   1963k , avc1.4d401f, 23.976fps, mp4a.40.2
hls-1963-1   mp4        1280x720   1963k , avc1.4d401f, 23.976fps, mp4a.40.2
hls-1963-2   mp4        1280x720   1963k , avc1.4d401f, 23.976fps, mp4a.40.2
hls-2609-0   mp4        1280x720   2609k , avc1.4d401f, 23.976fps, mp4a.40.2
hls-2609-1   mp4        1280x720   2609k , avc1.4d401f, 23.976fps, mp4a.40.2
hls-2609-2   mp4        1280x720   2609k , avc1.4d401f, 23.976fps, mp4a.40.2
hls-5264-0   mp4        1920x1080  5264k , avc1.640028, 23.976fps, mp4a.40.2
hls-5264-1   mp4        1920x1080  5264k , avc1.640028, 23.976fps, mp4a.40.2
hls-5264-2   mp4        1920x1080  5264k , avc1.640028, 23.976fps, mp4a.40.2
hls-8329-0   mp4        1920x1080  8329k , avc1.640028, 23.976fps, mp4a.40.2
hls-8329-1   mp4        1920x1080  8329k , avc1.640028, 23.976fps, mp4a.40.2
hls-8329-2   mp4        1920x1080  8329k , avc1.640028, 23.976fps, mp4a.40.2 (best)

Actually trying to fetch a sample hls mode, hls-704-0, I get:

youtube-dl --proxy 127.0.0.1:1080 -f hls-704-0 --hls-prefer-native "https://www.cwtv.com/shows/4400/past-is-prologue/?play=dead9843-33b1-4201-adf8-310692fe147f" => 

[CWTV] dead9843-33b1-4201-adf8-310692fe147f: Downloading JSON metadata
[ThePlatform] dead9843-33b1-4201-adf8-310692fe147f: Downloading SMIL data
[ThePlatform] dead9843-33b1-4201-adf8-310692fe147f: Downloading m3u8 information
[ThePlatform] dead9843-33b1-4201-adf8-310692fe147f: Downloading m3u8 information
[ThePlatform] dead9843-33b1-4201-adf8-310692fe147f: Downloading JSON metadata
[hlsnative] Downloading m3u8 manifest
ERROR: unable to download video data: HTTP Error 403: Forbidden

The 403 is being generated by a stream CDN with hostname cwtv-amd-akamai.akamaized.net 😠 ; upon repeated attempts, I may get connected to stream CDN with hostname stream-hls.cwtv.com, and that one does NOT 403:

...
[debug] Invoking downloader on 'https://stream-hls.cwtv.com/nosec/The_CW/255/4/126967877560/4400-101-PastIsPrologue-P101-CW-V2_126967365762_m3u8_video_640x360_568000_primary_audio_eng_4.m3u8'
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 423
[download] Destination: Past is Prologue-dead9843-33b1-4201-adf8-310692fe147f.mp4
[download]   1.2% of ~212.60MiB at 29.48KiB/s ETA 02:34:08

... so it's probably a case of trial-and-error 😜 currently... The CWTVIE appears partially broken as far as the Akamai CDN goes, that one may require some extra request header(s) to allow access to the HLS variants...

dirkf commented 2 years ago

From the UK I can get the M3U8 from ThePlatform but the Akamai media give 403. However Akamai is prone to throw 403 for yt-dl's UAs regardless of geo-restriction, so a user with a non-blacklisted US IP may need to try some other UAs. Just Mozilla/5.0 may work, or an iPhone UA.

octavioj commented 2 years ago

Excellent suggestion. Let me try Mozilla and will report back. I will also try a VPN to another country.

octavioj commented 2 years ago

I used the link from theplatform and it worked. It seems I still cannot get a non-Akamai link in different attempts. It is also something that is most likely related to VPN IP addresses. When I turned off the VPN on the virtual machine where I run yt-dl it worked. So I guess yt-dl cannot do anything to help with this. Thank you all for the quick replies.

Vangelis66 commented 2 years ago

Using this URL (should work in browser):

http://link.theplatform.com/s/cwtv/media/guid/2703454149/deec61a8-e0a1-4c01-8906-4e0b363350d5?formats=M3U

From the UK I can get the M3U8 from ThePlatform

From a non-US/non-UK IP, all one gets is:

CWTV

but should work either way. Also it should work with HTTP or HTTPS.

From your posted Request Headers (edit: later deleted), I have a gut feeling the most vital one is the Cookie one (named aka_debug, "Akamai Debug" ?); it references 3 distinct IPs: 1) your client (browser) IP (72.xxx.xx.xx), 2) a "ghost" IP (23.205.110.27), 3) a "ghostforward" IP (23.38.189.161) and a sanctioned geo-location (US-TX); I presume both "ghost*' IPs to belong to Akamai CDN nodes...

However, Akamai is prone to throw 403 for yt-dl's UAs regardless of geo-restriction, so a user with a non-blacklisted US IP may need to try some other UAs. Just Mozilla/5.0 may work, or an iPhone UA

I fired up my US VPN, which provides several US IP nodes... All I can tell you is that CWTV are being VERY aggressive at blocking commercial VPN IPs... 😠 But, alas, I found a whitlelisted node... I used yt-dlp to test, because in a recent, ITV-related, issue it was pointed out that yt-dlp sends out a more recent/"palatable" UA string; via that whitelisted US IP, I would mostly get the non-Akamai CDN, which always works 👍 , but after some retries I did also get the Akamai CDN, which WORKED with yt-dlp:

yt-dlp -f hls-704-0 "https://www.cwtv.com/shows/4400/past-is-prologue/?play=dead9843-33b1-4201-adf8-310692fe147f" -v => 

[debug] Encodings: locale cp1253, fs utf-8, out utf-8 (No ANSI), err utf-8 (No ANSI), pref cp1253
[debug] yt-dlp version 2022.02.22 [e1bdf91] (win_exe)
[debug] ** This build is unofficial daily builds, provided for ease of use.
[debug] ** Please do not ask for any support.
[debug] Python version 3.7.9 (CPython 32bit) - Windows-Vista-6.0.6003-SP2
[debug] exe versions: ffmpeg 4.4.1 (setts), ffprobe 4.4.1
[debug] Optional libraries: Cryptodome, mutagen, sqlite, websockets
[debug] Proxy map: {}
[debug] [CWTV] Extracting URL: https://www.cwtv.com/shows/4400/past-is-prologue/?play=dead9843-33b1-4201-adf8-310692fe147f
[CWTV] dead9843-33b1-4201-adf8-310692fe147f: Downloading JSON metadata
[debug] [ThePlatform] Extracting URL: http://link.theplatform.com/s/cwtv/media/guid/2703454149/dead9843-33b1-4201-adf8-310692fe147f?format=SMIL&formats=M3U&tracking=true&mbr=false#__youtubedl_smuggle=%7B%22force_smil_url%22%3A+true%7D
[ThePlatform] dead9843-33b1-4201-adf8-310692fe147f: Downloading SMIL data
[ThePlatform] dead9843-33b1-4201-adf8-310692fe147f: Downloading m3u8 information

WARNING: [ThePlatform] Ignoring subtitle tracks found in the HLS manifest; if any subtitle tracks are missing, please report this issue on  https://github.com/yt-dlp/yt-dlp , filling out the "Broken site" issue template properly. Confirm you are on the latest version using -U
[ThePlatform] dead9843-33b1-4201-adf8-310692fe147f: Downloading m3u8 information
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[ThePlatform] dead9843-33b1-4201-adf8-310692fe147f: Downloading JSON metadata
[info] dead9843-33b1-4201-adf8-310692fe147f: Downloading 1 format(s): hls-704-0
[debug] Invoking downloader on "https://cwtv-amd-akamai.akamaized.net/nosec/The_CW/255/4/126967877560/4400-101-PastIsPrologue-P101-CW-V2_126967365762_m3u8_video_640x360_568000_primary_audio_eng_4.m3u8"
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 423
[download] Destination: Past is Prologue [dead9843-33b1-4201-adf8-310692fe147f].mp4
[download]   2.2% of ~213.01MiB at 102.57KiB/s ETA 26:54 (frag 9/423)
ERROR: Interrupted by user
Terminate batch job (Y/N)? y

So, as hinted already in this thread, yt-dl's failure (403) on the Akamai CDN might be UA (or other RequestHeader) related... Best Regards

LATER EDIT: Neither Cookie or yt-dl's default UA seem to actually play a role here, as demonstrated by the definitive verbose log further down below 😉 ...

Vangelis66 commented 2 years ago

... Well, probably my hypothesis/"theory" on the Cookie RequestHeader is wrong, I have no problem retracting that... But a desktop browser or cURL isn't youtube-dl, showing that the first two work doesn't necessarily imply that the latter should, too (see here and reply below) ... What you have yet to produce is a proof inside a yt-dl context that

this is a geo-blocking issue, and only a geo-blocking issue.

You appear to have a whitelisted IP as far as CWTV are concerned, where's your verbose yt-dl log working with the cwtv-amd-akamai.akamaized.net CDN with yt-dl's default UA?

In fact, that's what I'm trying to achieve here for the last 30min, but CWTV are giving me a hard time (all commercial VPN US nodes I have access to have been now blacklisted by them 😡 ) ...

At long last, an "urban" US IP (non-datacenter) and many retries have me connected to their Akamai CDN, with a successful fetch:

youtube-dl --proxy "127.0.0.1:8080" -f hls-704-0 --hls-prefer-native "https://www.cwtv.com/shows/4400/past-is-prologue/?play=dead9843-33b1-4201-adf8-310692fe147f" -v => 

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--proxy', '127.0.0.1:8080', '-f', 'hls-704-0', '--hls-prefer-native', 'https://www.cwtv.com/shows/4400/past-is-prologue/?play=dead9
843-33b1-4201-adf8-310692fe147f', '-v']
[debug] Encodings: locale cp1253, fs utf-8, out utf-8, pref cp1253
[debug] youtube-dl version 2021.06.06+18-git-20210701-ga803582+PRs#30184,#30266
[debug] Lazy loading extractors enabled
[debug] Python version 3.7.12 (CPython) - Windows-Vista-6.0.6003-SP2
[debug] exe versions: ffmpeg 4.4.1, ffprobe 4.4.1, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {'http': '127.0.0.1:8080', 'https': '127.0.0.1:8080'}
[CWTV] dead9843-33b1-4201-adf8-310692fe147f: Downloading JSON metadata
[ThePlatform] dead9843-33b1-4201-adf8-310692fe147f: Downloading SMIL data
[ThePlatform] dead9843-33b1-4201-adf8-310692fe147f: Downloading m3u8 information
[ThePlatform] dead9843-33b1-4201-adf8-310692fe147f: Downloading m3u8 information
[ThePlatform] dead9843-33b1-4201-adf8-310692fe147f: Downloading JSON metadata
[debug] Invoking downloader on 'https://cwtv-amd-akamai.akamaized.net/nosec/The_CW/255/4/126967877560/4400-101-PastIsPrologue-P101-CW-V2_126967365762_m3u8_video_640x360_568000_primary_audio_eng_4.m3u8'
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 423
[download] Destination: Past is Prologue-dead9843-33b1-4201-adf8-310692fe147f.mp4
[download]   1.4% of ~212.60MiB at 130.52KiB/s ETA 48:31
ERROR: Interrupted by user

so my second "theory" also dismissed...

The OP got what he wanted,

When I turned off the VPN on the virtual machine where I run yt-dl it worked.

(without specifying which CDN he was successful with when he turned the VPN off), so this issue may be closed... Possible cause: Very stringent ACLs on CWTV's Akamai CDN, especially with VPN IPs, less so on their alternate CDN (stream-hls.cwtv.com) .

OT words ... I'm not here to "confuse" anyone, for that matter... I'm just a non-coder volunteer with a desire to help others, when my free time allows me to do so... 😉 I also have a background in Science (Physics/Chemistry/Molecular Biology) and to "theorise" is essential for the progression of Science... Granted some "theories" have been proven wrong over the course of time (several having been deemed "valid" for quite long periods), but should they have not been made at all to begin with? Making theories and posting about them is good by my book; dismissal of the "invalid" ones by fellow peers narrows down the source of the problem and contributes towards a speedier resolution; it's your prerogative to call mine "pointless", I guess, but I hope not all share your opinion ... Later addition: Blocking me as a user, as the result of my comments/opinions in this thread, speaks volumes by itself...

Regards.