ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.47k stars 10.05k forks source link

HTTP Error 403: Forbidden (caused by HTTPError()); #24008

Closed AliBedaer closed 2 years ago

AliBedaer commented 4 years ago

Checklist

Verbose log

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-u', u'PRIVATE', u'-p', u'PRIVATE', u'-o', u'%(playlist_title)s/%(chapter_number)s - %(chapter)s/%(playlist_index)s-%(title)s.%(ext)s', u'--verbose', u'https://app.pluralsight.com/library/courses/aws-developer-getting-started']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2020.01.24
[debug] Python version 2.7.17 (CPython) - Linux-5.3.0-28-generic-x86_64-with-LinuxMint-19.3-tricia
[debug] exe versions: ffmpeg 3.4.6, ffprobe 3.4.6
[debug] Proxy map: {}
[pluralsight:course] aws-developer-getting-started: Downloading JSON metadata
[download] Downloading playlist: AWS Developer: Getting Started
[pluralsight:course] playlist AWS Developer: Getting Started: Collected 68 video ids (downloading 68 of them)
[download] Downloading video 1 of 68
[pluralsight] Downloading login page
[pluralsight] Logging in
ERROR: Unable to download webpage: HTTP Error 403: Forbidden (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 627, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2237, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

Description

I'm trying to download a course from Pluralsight but I get this exception every time, username and password are provided

SysadminJeroen commented 4 years ago

I'm experiencing the same issue. A valid username and password is provided with access to the course.

[debug] System config: []
[debug] User config: [u'-o', u'%(title)s.%(ext)s', u'--external-downloader=aria2c']
[debug] Custom config: []
[debug] Command-line args: [u'--username', u'PRIVATE', u'--password', u'PRIVATE', u'--sleep-interval', u'35', u'--max-sleep-interval', u'120', u'--sub-lang', u'en', u'--sub-format', u'srt', u'--write-sub', u'--verbose', u'https://app.pluralsight.com/library/courses/data-analysis-shiny-r-playbook/', u'--playlist-start', u'1']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2020.01.24
[debug] Python version 2.7.16 (CPython) - Linux-4.4.0-17763-Microsoft-x86_64-with-debian-10.3
[debug] exe versions: ffmpeg 4.1.4-1, ffprobe 4.1.4-1
[debug] Proxy map: {}
[pluralsight:course] data-analysis-shiny-r-playbook: Downloading JSON metadata
[download] Downloading playlist: Data Analysis with Shiny: R Playbook
[pluralsight:course] playlist Data Analysis with Shiny: R Playbook: Collected 23 video ids (downloading 23 of them)
[download] Downloading video 1 of 23
[pluralsight] Downloading login page
[pluralsight] Logging in
ERROR: Unable to download webpage: HTTP Error 403: Forbidden (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 627, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2237, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
SysadminJeroen commented 4 years ago

One workaround that works is to use a cookies.txt file. A working command that I use is this. youtube-dl --cookies "../cookies.txt" --user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0" --sleep-interval 35 --max-sleep-interval 120 --all-subs --sub-format srt --write-sub --verbose https://app.pluralsight.com/library/courses/data-analysis-shiny-r-playbook

rojter-tech commented 4 years ago

One workaround that works is to use a cookies.txt file. A working command that I use is this. youtube-dl --cookies "../cookies.txt" --user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0" --sleep-interval 35 --max-sleep-interval 120 --all-subs --sub-format srt --write-sub --verbose https://app.pluralsight.com/library/courses/data-analysis-shiny-r-playbook

As soon as the credentials is specifed including the cookiefile has no effect, the error show up immediately after trying to log in. However if I do not specify credentials the first video can be downloaded in some (most) cases. Spended the whole night trying to figure out a proper way to establish client server relation, now it's time to have some rest. :)

Used both youtube-dl 2020.03.01 on multiple platforms, and my forked version that were never changed since 2020.01.24, and this problem showed up today for the first time.

Update:

Okey, somehow I managed to get this to work on my Debian 10 server. I am using the latest version my forked youtube-dl. But it works with youtube-dl 2020.03.01 aswell.

[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] plura-dl version 1.0.0b2
[debug] Git HEAD: d2e479d
[debug] Python version 3.7.3 (CPython) - Linux-4.19.0-8-amd64-x86_64-with-debian-10.3
[debug] exe versions: none
[debug] Proxy map: {}
[pluralsight:course] 12-principles-animation-toon-boom-harmony-1475: Downloading JSON metadata
[download] Downloading playlist: 12 Principles of Animation in Toon Boom Harmony
[pluralsight:course] playlist 12 Principles of Animation in Toon Boom Harmony: Collected 13 video ids (downloading 13 of them)
[download] Downloading video 1 of 13
[pluralsight] Downloading login page
[pluralsight] Logging in
[pluralsight] 12-principles-animation-toon-boom-harmony-1475-m1-0: Downloading JSON metadata
[pluralsight] 12-principles-animation-toon-boom-harmony-1475-m1-0: Downloading mp4-high-widescreen viewclip graphql
[pluralsight] 12-principles-animation-toon-boom-harmony-1475-m1-0: Waiting for 4 seconds to avoid throttling
[debug] Invoking downloader on 'https://vid.pluralsight.com/clips/resolution/5e35fbc1-8eaa-4806-8f11-d2aeffaf3bf9/current/mp4/1280x720.mp4?I6qzjDN9dS6y-__and-2N6OrQbX5JYsshbeQmy9rw9WbMS_R7Z5HlAZALs2-AxA9fMPd9gglkEjjO3TNQsxF6hEJREQaQwDWS5tbssXjAa1k99dq9Gmp_j0qED9_GTqP5K2NAcJ9YUvJMgxEIFVXueaFVCAxu9lCtQjfjDGLGW26HgX6QtYOrDRHMjgU'
[download] Sleeping 150.08 seconds...
dRaiNe commented 4 years ago

I believe you receive 403 when you are flagged for scraping. If you're flagged, you will receive 403 with some fancy text even in the browser. Also, if you mix downloading via youtube-dl then break some of their limits and then open browser - there is a window checking for 5 seconds if you're actually a web browser. Additionally, sometimes if you download a lot via youtube-dl and then open the browser you're not banned yet, but if you click on any video it will be stuck in this 'loading video spinner'. They employed some checks I guess.

geffchang commented 4 years ago

there is a window checking for 5 seconds if you're actually a web browser

@dRaiNe Is there a captcha going on?

aximili commented 4 years ago

I believe you receive 403 when you are flagged for scraping. If you're flagged, you will receive 403 with some fancy text even in the browser.

That may be true. I was downloading Pluralsight videos successfully, and suddenly (an hour later) getting 403. But in browser (incognito) it's all fine. I logged in as fast as I could (I had auto complete so it was less than 5 seconds), it was successful, I didn't see any error. Viewing videos are also smooth.

Any work around? :(

jeehay commented 4 years ago

I'm getting same error, I can login and view the videos without any issue. On some course i am able to download but not all. for example I was download a course then stop and then try to dowload again same course i got 403 error.

[debug] User config: [] [debug] Custom config: [] [debug] Command-line args: ['--username', 'PRIVATE', '--password', 'PRIVATE', '-o', '%(chapter_number)s - %(chapter)s/%(playlist_index)s - %(title)s.%(ext)s', '--playlist-start', '1', '--verbose', '--min-sleep-interval', '90', '--max-sleep-interval', '130', 'https://app.pluralsight.com/library/courses/commvault-virtualization-concepts-configuration/table-of-contents'] [debug] Encodings: locale cp1252, fs mbcs, out cp1252, pref cp1252 [debug] youtube-dl version 2020.03.24 [debug] Python version 3.4.4 (CPython) - Windows-10-10.0.18362 [debug] exe versions: none [debug] Proxy map: {} [pluralsight:course] commvault-virtualization-concepts-configuration: Downloading JSON metadata [download] Downloading playlist: Commvault® Virtualization [pluralsight:course] playlist Commvault® Virtualization: Collected 7 video ids (downloading 7 of them) [download] Downloading video 1 of 7 [pluralsight] Downloading login page [pluralsight] Logging in ERROR: Unable to download webpage: HTTP Error 403: Forbidden (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpjwbwqymm\build\youtube_dl\extractor\common.py", line 627, in _request_webpage File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpjwbwqymm\build\youtube_dl\YoutubeDL.py", line 2238, in urlopen File "C:\Python\Python34\lib\urllib\request.py", line 470, in open File "C:\Python\Python34\lib\urllib\request.py", line 580, in http_response File "C:\Python\Python34\lib\urllib\request.py", line 508, in error File "C:\Python\Python34\lib\urllib\request.py", line 442, in _call_chain File "C:\Python\Python34\lib\urllib\request.py", line 588, in http_error_default

jeehay commented 4 years ago

Please do not put your email and password.

One thing does work for me on some course is that you login and view few coursing and then try to download it.

garchaaman19 commented 4 years ago

@jeehay Thanks I didnt notice it. Okay I will check if it works. I am using ubuntu. Should I try on windows ?

jeehay commented 4 years ago

I'm on windows.

Try this. View the course in browser, then save the cookies.txt file and also try to break the download into small for example. Put --playlist-start 1 and --playlist-end 10

The above has work for me even for course which didnt download.

AbhieSpeaks commented 4 years ago

@jeehay Where do you find the cookies.txt file and where do you save it?

Hrxn commented 4 years ago

@AbhieSpeaks You have to create the cookies.txt file for yourself. You can do that by using one of the browser extensions [1] to export your cookies from signed-in sites from your browser. And then pass your cookies.txt file to youtube-dl by using the youtube-dl --cookies "/path/to/your/cookies.txt" commmand-line option.

[1] https://addons.mozilla.org/en-US/firefox/addon/export-cookies-txt/

fullmerjf commented 4 years ago

I've tried the cookies and user-agent. It's still unsuccessful. Any other advice?

jaimebl commented 4 years ago

I've tried the cookies and user-agent. It's still unsuccessful. Any other advice?

There's just two ways to bypass this error:

Assuming that your 403 error comes after [pluralsight] Logging in when pluralsight wants you to complete the captcha

dirkf commented 2 years ago

Duplicate of #29776, now.