ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.2k stars 10.03k forks source link

[Crunchyroll] ERROR: Unable to download webpage: HTTP Error 403: Forbidden #28398

Open tasdanduil opened 3 years ago

tasdanduil commented 3 years ago

Checklist

Verbose log

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', '-f', 'best[format_id*=enUS]', '--playlist-start', '68', 'https://www.crunchyroll.com/attack-on-titan']
[debug] Encodings: locale cp1252, fs mbcs, out cp437, pref cp1252
[debug] youtube-dl version 2021.03.03
[debug] Python version 3.4.4 (CPython) - Windows-10-10.0.18362
[debug] exe versions: ffmpeg N-93542-gecdaa4b4fa, ffprobe N-93542-gecdaa4b4fa
[debug] Proxy map: {}
[crunchyroll:playlist] attack-on-titan: Downloading webpage
ERROR: Unable to download webpage: HTTP Error 403: Forbidden (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpvjygsik_\build\youtube_dl\extractor\common.py", line 632, in _request_webpage
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpvjygsik_\build\youtube_dl\YoutubeDL.py", line 2275, in urlopen
  File "C:\Python\Python34\lib\urllib\request.py", line 470, in open
  File "C:\Python\Python34\lib\urllib\request.py", line 580, in http_response
  File "C:\Python\Python34\lib\urllib\request.py", line 508, in error
  File "C:\Python\Python34\lib\urllib\request.py", line 442, in _call_chain
  File "C:\Python\Python34\lib\urllib\request.py", line 588, in http_error_default

Description

I came across this issue not too long ago. Haven't been able to fix it on my own, so here I am. I've tried adding cookies.txt to the command line as well as the user agent, neither solved the issue.

CHJ85 commented 3 years ago

The Crunchyroll extractor cannot download entire seasons. Only separate episodes. What you can do is use VRV.co instead. Because that extractor does work. Alternatively you can download a browser extension called Link Gopher and use that to grab all the Attack on Titan episode links on Crunchyroll and save all these links into a txt file, then run youtube-dl -a links.txt

tasdanduil commented 3 years ago

I have used this command line (youtube-dl -f best[format_id*=enUS] --playlist-start [number] https://www.crunchyroll.com/[show]) to download a lot of series from Crunchyroll. My issue is that I can't seem to get anything to download from Crunchyroll anymore whether it's a playlist or an individual episode. It's not even returning a format list these days, only the 403 error. This wasn't a problem until about a few weeks ago. After repeated attempts a couple days ago, I did get one episode to download. Of course that's kind of a pain to do every time I'd like to download something.

CHJ85 commented 3 years ago

@tasdanduil I guess they've made some changes to their API. They've been talking about this new beta version for the longest time now, which might have something to do with it. But VRV.co still works. You can download crunchyroll content from there. youtube-dl https://vrv.co/series/GR751KNZY/Attack-on-Titan

tasdanduil commented 3 years ago

Alright. Thanks for the recommendation. I'll probably be using that from now on.

CHJ85 commented 3 years ago

@tasdanduil No problem. I believe all the same shows are available there too, because it is essentially the same Crunchyroll but with Hidive and a few other anime and cartoon streaming services bundled together. And if I'm not mistaking, I believe you can login with your crunchyroll username/password there too.

Columbo199X commented 3 years ago

@CHJ85

The Crunchyroll extractor cannot download entire seasons. Only separate episodes.

Do you know if this also applies to --batch-file "links.txt"? I've been trying to batch download episodes but it tends to error after a few episodes downloading fine.

CHJ85 commented 3 years ago

@Columbo199X I have no idea. Sorry.

Canz2 commented 3 years ago

Hello I am getting this error too but it's just for one file not a playlist did you have any idea?

Goten87 commented 3 years ago

im getting the same when trying to download https://www.crunchyroll.com/sword-art-online/episode-1-the-world-of-swords-606739

trying to use it on Debian Sid fully updated as of today this is my error

$youtube-dl --verbose "https://www.crunchyroll.com/sword-art-online/episode-1-the-world-of-swords-606739"
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'--verbose', u'https://www.crunchyroll.com/sword-art-online/episode-1-the-world-of-swords-606739']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.03.25
[debug] Python version 2.7.18 (CPython) - Linux-5.10.0-4-amd64-x86_64-with-debian-bullseye-sid
[debug] exe versions: none
[debug] Proxy map: {}
[crunchyroll] 606739: Downloading webpage
ERROR: Unable to download webpage: HTTP Error 403: Forbidden (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 634, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2279, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 467, in error
    result = self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 654, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
Atemu commented 3 years ago

The problem seems to be that, when ytdl tries to GET i.e. /darling-in-the-franxx, CR (or CF) returns 403. This does not happen in the browser for the same URL.
Same is true when using a cookie exported from my browser session.

After a lot of trial and error, I've found a workaround though:
Set the --user-agent to your browser's and pass your browser's CR cookies to ytdl.

It's fairly fragile (one slight difference in the UA string makes it 403 again and also invalidates the cookie file?) but looks to be working well enough.

Seems like CR is getting more sophisticated with blocking automated tools and they've had a DRM rollout looming for a while too...

6cUbi57z commented 3 years ago

It seems to be the __cf_bm cookie which is required. I generated a cookie file and removed them one by one until youtube-dl stopped working. I then removed all of the others and left just this cookie and it worked fine.

Seems this is a cloudflare cookie: https://support.cloudflare.com/hc/en-us/articles/200170156-Understanding-the-Cloudflare-Cookies#12345681

omegahack0 commented 3 years ago

@6cUbi57z I havent messed with using cookies and youtube-dl how would I get the specific cookie needed to resolve the issue?

6cUbi57z commented 3 years ago

@omegahack0 the info about the cookie was more for someone trying to fix the bug.

For the workaround, you will need to go to crunchyroll in your browser and then use something like https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/ to export the cookies for the site. Then use the cookie file paramter to point to the file and probably the user agent parameter to match your browser as closely as possible.

The workaround doesn't work in all cases though as the purpose of the cookie is to prevent automated programs like this from accessing the site.

Atemu commented 3 years ago

If it breaks, you usually only need to re-export the cookies file from your browser.

dirkf commented 3 years ago

According to Cloudflare, this is the fingerprinting technique used to compute the cookie and/or deny access.

6cUbi57z commented 2 years ago

Issue should be closed, poster ignored template and did not post single video

The ticket looks to have been rasied using the broken site template which doesn't specifically prompt for urls. The poster has included a log complete with command line arguments which includes the url and start index of the playlist among plenty of other information. If they've done something wrong I don't know what it is and would probably make the same mistake.

rcrx commented 2 years ago

same problem:

[crunchyroll] 789294: Downloading webpage
ERROR: [crunchyroll] 789294: Unable to download webpage: HTTP Error 403: Forbidden (caused by <HTTPError 403: 'Forbidden'>); please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
dirkf commented 2 years ago

Although this may be relevant, yt-dlp's extractor for CR currently has significant changes relative to yt-dl's (login and beta site support, in general). If you'd like your issue to be progressed you should raise it on the yt-dlp tracker.

dirkf commented 2 years ago

As with some other CF-blocked sites, using --user-agent 'Mozilla/5.0' allows yt-dl to extract the video. For instance with @Goten87's URL:

$ python -m youtube_dl --ignore-config -f worst -v --test 'https://www.crunchyroll.com/sword-art-online/episode-1-the-world-of-swords-606739' --user-agent 'Mozilla/5.0'
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'--ignore-config', u'-f', u'worst', u'-v', u'--test', u'https://www.crunchyroll.com/sword-art-online/episode-1-the-world-of-swords-606739', u'--user-agent', u'Mozilla/5.0']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: cc179df34
[debug] Python version 2.7.17 (CPython) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[crunchyroll] 606739: Downloading webpage
[crunchyroll] 606739: Downloading adaptive_hls-audio-jaJP information
[crunchyroll] 606739: Downloading adaptive_hls-audio-jaJP-hardsub-ptBR information
[crunchyroll] 606739: Downloading adaptive_hls-audio-jaJP-hardsub-esLA information
[crunchyroll] 606739: Downloading adaptive_hls-audio-jaJP-hardsub-enUS information
[crunchyroll] 606739: Downloading adaptive_hls-audio-jaJP-hardsub-arME information
[crunchyroll] 606739: Downloading media info
WARNING: Unable to download XML: HTTP Error 404: Not Found
[debug] Invoking downloader on u'https://pl.crunchyroll.com/evs3/d4e9300717fae6c66a7e80909e734c5a/assets/uhcpzj0r97j1i92_1672805.mp4/index-v1-a1.m3u8?Expires=1655248090&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9wbC5jcnVuY2h5cm9sbC5jb20vZXZzMy9kNGU5MzAwNzE3ZmFlNmM2NmE3ZTgwOTA5ZTczNGM1YS9hc3NldHMvdWhjcHpqMHI5N2oxaTkyXyoubTN1OD8qIiwiQ29uZGl0aW9uIjp7IkRhdGVMZXNzVGhhbiI6eyJBV1M6RXBvY2hUaW1lIjoxNjU1MjQ4MDkwfX19XX0_&Signature=sOq2~nYrvIXkzpzP2tfjwCPDlX5pUqInkuGjR0PdYz~xjuDZaDffgH5dcq3o4jUdj7pxpf3160ShmkZuBnP-B66pxx1hZpnWLXKUYFteSc-c-fdgyMEL6mVGtCSwWI0R~42D-wiHKgjzi1mDezmfDRXFujn44aOrJsVyY4AfZIoksL6DYY5xKzGCo~~ytYgL5k02zito-BF7nSIx7WUGnfM0LgaxF4VbGEyFlW1k9~T-FGI~dwGBOAlynZQbykOWUUgPxK6qo-oZLWdp72HpNhA5FO0mZa3UzFhdyCB3qmxEvoCGr7xk1rgVS6MKLg9BpjXjt5jWvbV~JvPbbYmNzw__&Key-Pair-Id=APKAJMWSQ5S7ZB3MF5VA&cdn=ll-prod'
[download] Destination: Sword Art Online Episode 1 – The World of Swords-606739.mp4
[debug] ffmpeg command line: ffmpeg -y -loglevel verbose -headers 'Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: Mozilla/5.0
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Cookie: __cf_bm=84USPPLOxkzBIJ2EcxeOaIArQjedWU4Oumv78DopESw-1655075273-0-ATGxWTE/gbLaCZfox7ck/ByLG9fdnzj4PQngxILeB05d8OpH2f4L88THznuXQ15z6euQnVKvVtPIowOL8DN1BR1hUZoohCRF6L5Zhu3uCYKJ; c_visitor=555b042c-5425-4fde-b889-9f6b4ddf51d8; session_id=3e07a094fa62a34c580c670afc258ab9
' -i 'https://pl.crunchyroll.com/evs3/d4e9300717fae6c66a7e80909e734c5a/assets/uhcpzj0r97j1i92_1672805.mp4/index-v1-a1.m3u8?Expires=1655248090&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9wbC5jcnVuY2h5cm9sbC5jb20vZXZzMy9kNGU5MzAwNzE3ZmFlNmM2NmE3ZTgwOTA5ZTczNGM1YS9hc3NldHMvdWhjcHpqMHI5N2oxaTkyXyoubTN1OD8qIiwiQ29uZGl0aW9uIjp7IkRhdGVMZXNzVGhhbiI6eyJBV1M6RXBvY2hUaW1lIjoxNjU1MjQ4MDkwfX19XX0_&Signature=sOq2~nYrvIXkzpzP2tfjwCPDlX5pUqInkuGjR0PdYz~xjuDZaDffgH5dcq3o4jUdj7pxpf3160ShmkZuBnP-B66pxx1hZpnWLXKUYFteSc-c-fdgyMEL6mVGtCSwWI0R~42D-wiHKgjzi1mDezmfDRXFujn44aOrJsVyY4AfZIoksL6DYY5xKzGCo~~ytYgL5k02zito-BF7nSIx7WUGnfM0LgaxF4VbGEyFlW1k9~T-FGI~dwGBOAlynZQbykOWUUgPxK6qo-oZLWdp72HpNhA5FO0mZa3UzFhdyCB3qmxEvoCGr7xk1rgVS6MKLg9BpjXjt5jWvbV~JvPbbYmNzw__&Key-Pair-Id=APKAJMWSQ5S7ZB3MF5VA&cdn=ll-prod' -c copy -fs 10241 -f mp4 'file:Sword Art Online Episode 1 – The World of Swords-606739.mp4.part'
ffmpeg version 4.3-2ubuntu0~ppa16.04+8 Copyright (c) 2000-2020 the FFmpeg developers
...
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
[tcp @ 0x1605000] Starting connection attempt to 108.156.28.89 port 443
[tcp @ 0x1605000] Successfully connected to 108.156.28.89 port 443
[hls @ 0x1601040] Skip ('#EXT-X-ALLOW-CACHE:YES')
[hls @ 0x1601040] Skip ('#EXT-X-VERSION:5')
[hls @ 0x1601040] HLS request for url 'https://ll.v.vrv.co/evs3/d4e9300717fae6c66a7e80909e734c5a/assets/uhcpzj0r97j1i92_1672805.mp4/seg-1-v1-a1.ts?t=exp=1655248094~acl=/evs3/d4e9300717fae6c66a7e80909e734c5a/assets/uhcpzj0r97j1i92_1672805.mp4/*~hmac=6b50d6617f56fe5978aee26483b5e58d15cc5e00ae7fb22f44077c8b0ac3293b', offset 0, playlist 0
[hls @ 0x1601040] Opening 'https://ll.v.vrv.co/evs3/d4e9300717fae6c66a7e80909e734c5a/assets/uhcpzj0r97j1i92_1672805.mp4/encryption.key?t=exp=1655248094~acl=/evs3/d4e9300717fae6c66a7e80909e734c5a/assets/uhcpzj0r97j1i92_1672805.mp4/*~hmac=6b50d6617f56fe5978aee26483b5e58d15cc5e00ae7fb22f44077c8b0ac3293b' for reading
[tcp @ 0x18ae700] Starting connection attempt to 87.248.214.8 port 443
[tcp @ 0x18ae700] Successfully connected to 87.248.214.8 port 443
[AVIOContext @ 0x1b1ed00] Statistics: 16 bytes read, 0 seeks
[hls @ 0x1601040] Opening 'crypto+https://ll.v.vrv.co/evs3/d4e9300717fae6c66a7e80909e734c5a/assets/uhcpzj0r97j1i92_1672805.mp4/seg-1-v1-a1.ts?t=exp=1655248094~acl=/evs3/d4e9300717fae6c66a7e80909e734c5a/assets/uhcpzj0r97j1i92_1672805.mp4/*~hmac=6b50d6617f56fe5978aee26483b5e58d15cc5e00ae7fb22f44077c8b0ac3293b' for reading
[tcp @ 0x1b1f980] Starting connection attempt to 87.248.214.8 port 443
[tcp @ 0x1b1f980] Successfully connected to 87.248.214.8 port 443
[h264 @ 0x18c9c00] Reinit context to 432x240, pix_fmt: yuv420p
Input #0, hls, from ...
  Duration: 00:23:40.00, start: 0.226000, bitrate: 0 kb/s
  Program 0 
    Metadata:
      variant_bitrate : 0
    Stream #0:0: Video: h264 (Constrained Baseline), 1 reference frame ([27][0][0][0] / 0x001B), yuv420p(left), 428x240 (432x240) [SAR 320:321 DAR 16:9], 23.98 fps, 23.98 tbr, 90k tbn, 47.95 tbc
    Metadata:
      variant_bitrate : 0
    Stream #0:1: Audio: aac (LC) ([15][0][0][0] / 0x000F), 22050 Hz, stereo, fltp
    Metadata:
      variant_bitrate : 0
    Stream #0:2: Data: timed_id3 (ID3  / 0x20334449)
    Metadata:
      variant_bitrate : 0
Output #0, mp4, to 'file:Sword Art Online Episode 1 – The World of Swords-606739.mp4.part':
  Metadata:
    encoder         : Lavf58.45.100
    Stream #0:0: Video: h264 (Constrained Baseline), 1 reference frame (avc1 / 0x31637661), yuv420p(left), 428x240 (0x0) [SAR 320:321 DAR 16:9], q=2-31, 23.98 fps, 23.98 tbr, 90k tbn, 90k tbc
    Metadata:
      variant_bitrate : 0
    Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 22050 Hz, stereo, fltp
    Metadata:
      variant_bitrate : 0
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
Automatically inserted bitstream filter 'aac_adtstoasc'; args=''
No more output streams to write to, finishing.
Not writing 'clli' atom. No content light level info.
Not writing 'mdcv' atom. Missing mastering metadata.
frame=   22 fps=0.0 q=-1.0 Lsize=      13kB time=00:00:00.87 bitrate= 125.4kbits/s speed=1.87e+03x    
video:5kB audio:6kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 14.250042%
Input file #0 (...):
  Input stream #0:0 (video): 22 packets read (5520 bytes); 
  Input stream #0:1 (audio): 16 packets read (6494 bytes); 
  Input stream #0:2 (data): 0 packets read (0 bytes); 
  Total: 38 packets (12014 bytes) demuxed
Output file #0 (file:Sword Art Online Episode 1 – The World of Swords-606739.mp4.part):
  Output stream #0:0 (video): 22 packets muxed (5520 bytes); 
  Output stream #0:1 (audio): 16 packets muxed (6494 bytes); 
  Total: 38 packets (12014 bytes) muxed
[AVIOContext @ 0x1c4b900] Statistics: 2 seeks, 4 writeouts
[AVIOContext @ 0x1b25980] Statistics: 35392 bytes read, 0 seeks
[AVIOContext @ 0x184a700] Statistics: 103994 bytes read, 0 seeks
[ffmpeg] Downloaded 13726 bytes
[download] 100% of 13.40KiB in 00:03
$

Elsewhere it has been shown that sending the "correct" Connection header also satisfies CF, but urllib2 doesn't make it easy to do that.

For pages that need a login, it may, as above, be necessary to use the UA sent by the browser that gathered the login cookies instead.