ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.2k stars 10.03k forks source link

[Teachable] HTTP Error 400: Bad Request #30861

Open ykacenelen opened 2 years ago

ykacenelen commented 2 years ago

Hi everyone,

Need to bulk download a bunch of videos from a paid course on Teachable. Set up cookies and credentials parameters so yt-dl retrieves all information but ends up on a HTTP 400 error. Last version of yt-dl used. Thanks a lot for your help.

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--cookies', 'teachable.com_cookies.txt', '--username', 'PRIVATE', '--password', 'PRIVATE', '--verbose', '-o', './%(chapter_number)s-%(chapter)s/%(autonumber)03d-%(title)s.%(ext)s', 'https://derek-gripper-guitar.teachable.com/courses/enrolled/1420051']
[debug] Encodings: locale cp1252, fs mbcs, out cp850, pref cp1252
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.4.4 (CPython) - Windows-8.1-6.3.9600
[debug] exe versions: none
[debug] Proxy map: {}
[generic] 1420051: Requesting header
[redirect] Following redirect to https://derek-gripper-guitar.teachable.com/p/guitarlessonstream
[generic] guitarlessonstream: Requesting header
WARNING: Falling back on generic information extractor.
[generic] guitarlessonstream: Downloading webpage
[generic] guitarlessonstream: Extracting information
[TeachableCourse] Downloading derek-gripper-guitar.teachable.com login page
[TeachableCourse] Logging in to derek-gripper-guitar.teachable.com
ERROR: Unable to download webpage: HTTP Error 400: Bad Request (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\extractor\common.py", line 634, in _request_webpage
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\YoutubeDL.py", line 2288, in urlopen
  File "C:\Python\Python34\lib\urllib\request.py", line 470, in open
  File "C:\Python\Python34\lib\urllib\request.py", line 580, in http_response
  File "C:\Python\Python34\lib\urllib\request.py", line 508, in error
  File "C:\Python\Python34\lib\urllib\request.py", line 442, in _call_chain
  File "C:\Python\Python34\lib\urllib\request.py", line 588, in http_error_default
dirkf commented 2 years ago

Don't use --username ... etc with --cookies .... --username'/-uis taken as an instruction to log in, which--cookies` is supposed to bypass.

ykacenelen commented 2 years ago

Thx for your reply dirkf,

Don't use --username ... etc with --cookies .... --username'/-uis taken as an instruction to log in, which--cookies` is supposed to bypass.

I had guessed that credentials option was redundant due to cookies info but the initial cmd line asked for them: "Lecture contents locked. **Use --username and --password or --netrc to provide account credentials." See below:

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--cookies', 'teachable.com_cookies.txt', '--verbose', '-o', './%(chapter_number)s-%(chapter)s/%(autonumber)03d-%(title)s.%(ext)s','https://derek-gripper-guitar.teachable.com/courses/enrolled/1420051']
[debug] Encodings: locale cp1252, fs mbcs, out cp850, pref cp1252
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.4.4 (CPython) - Windows-8.1-6.3.9600
[debug] exe versions: none
[debug] Proxy map: {}
[generic] 1420051: Requesting header
[redirect] Following redirect to https://derek-gripper-guitar.teachable.com/p/guitarlessonstream
[generic] guitarlessonstream: Requesting header
WARNING: Falling back on generic information extractor.
[generic] guitarlessonstream: Downloading webpage
[generic] guitarlessonstream: Extracting information
[TeachableCourse] guitarlessonstream: Downloading webpage
[download] Downloading playlist: Lesson Stream: Echoes From the Zooms
[TeachableCourse] playlist Lesson Stream: Echoes From the Zooms: Collected 392 video ids (downloading 392 of them)
[download] Downloading video 1 of 392
[Teachable] 39290390: Downloading webpage
ERROR: Lecture contents locked. Use --username and --password or --netrc to provide account credentials.
Traceback (most recent call last):
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\YoutubeDL.py", line 815, in wrapper
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\YoutubeDL.py", line 836, in __extract_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\extractor\common.py", line 534, in extract
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\extractor\teachable.py", line 175, in _real_extract
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\extractor\common.py", line 943, in raise_login_required
youtube_dl.utils.ExtractorError: Lecture contents locked. **Use --username and --password or --netrc to provide account credentials.

Btw, are installed on my machine:

And I have no local "dst" user either ("File "C:\Users\dst\AppData\Roaming\..." as seen in the error msg)... What does it refer to?

dirkf commented 2 years ago

I hadn't noticed that: dst may refer to the maintainer who built the Windows self-extracting executable. That version uses its own built-in Python 3.4.4, as logged, regardless of your platform Python setup.

Apparently your cookies aren't being accepted. Possibly the site wants some other headers to match those sent by your browser (especially try send your browser's user agent header with --user-agent ...), or the expiry time is very short.

ykacenelen commented 2 years ago

Gave a try under Linux Mint 20.1... I imported Teachable site cookies (once logged in) with both Firefox ("cookies.txt") and Chromium ("Get Cookies") extensions... Added adequate --user-agent option in both cases... Don't know much about cookies' expiry time but they seem to last enough, and I launched yt-dl right after their export... Ended with the same errors as in Windows 8, and even got worse: Unsupported URL: https://derek-gripper-guitar.teachable.com/courses/enrolled/1420051

Gonna take hours to download the approx 400 videos by right-clicking one after another... but it already took a bunch to unsuccessfully find out how to bulk download them. If you - or anyone! - have a final hint, please let me know. Thank you for your help, have a nice Easter day,

dirkf commented 2 years ago

All I can add is that this is what happens with the git master when not logged in -- it may look familiar:

$ python -m youtube_dl -v -F 'https://derek-gripper-guitar.teachable.com/courses/enrolled/1420051'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://derek-gripper-guitar.teachable.com/courses/enrolled/1420051']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 675e4ca6d
[debug] Python version 2.7.17 (CPython) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[generic] 1420051: Requesting header
[redirect] Following redirect to https://derek-gripper-guitar.teachable.com/p/guitarlessonstream
[generic] guitarlessonstream: Requesting header
WARNING: Falling back on generic information extractor.
[generic] guitarlessonstream: Downloading webpage
[generic] guitarlessonstream: Extracting information
[TeachableCourse] guitarlessonstream: Downloading webpage
[download] Downloading playlist: Lesson Stream: Echoes From the Zooms
[TeachableCourse] playlist Lesson Stream: Echoes From the Zooms: Collected 392 video ids (downloading 392 of them)
[download] Downloading video 1 of 392
[Teachable] 39290390: Downloading webpage
ERROR: Lecture contents locked. Use --username and --password or --netrc to provide account credentials.
Traceback (most recent call last):
  File "youtube_dl/YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "youtube_dl/YoutubeDL.py", line 836, in __extract_info
    ie_result = ie.extract(url)
  File "youtube_dl/extractor/common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "youtube_dl/extractor/teachable.py", line 175, in _real_extract
    self.raise_login_required('Lecture contents locked')
  File "youtube_dl/extractor/common.py", line 943, in raise_login_required
    expected=True)
ExtractorError: Lecture contents locked. Use --username and --password or --netrc to provide account credentials.

$

And the same for the actual first video page https://derek-gripper-guitar.teachable.com/courses/guitarlessonstream/lectures/32719437.

ykacenelen commented 2 years ago

It is..! No need to bother with cookies then?! :) Would that mean that $ python -m youtube_dl -v -F 'https://derek-gripper-guitar.teachable.com/courses/enrolled/1420051' may work without cookies but with credentials? What is youtube_dl here (script, library?) and where to find it? Thx dirkf.

dirkf commented 2 years ago

Unfortunately the log is just showing that not including the cookies option led to the same result as including it. Presumably using credentials would fail in the same way as it did for you before.

If yt-dl is installed via pip, a youtube_dl module is installed that you can invoke with python -m .... If you have a development branch checked out and run the same command from the top-level youtube-dl directory of the checked-out source, the module in the youtube-dl/youtube_dl directory is run: that's what's in the log that I posted. These are just equivalent ways of running the youtube-dl command.

zageyiff commented 2 years ago

I just tried this on a paid teachable course and it worked in a fresh install of ubuntu mate, using the latest version of youtube-dl cookies obtained with the firefox cookies.txt extension

> youtube-dl --version
2021.12.17
> youtube-dl -v  --cookies cookies.txt -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best' -o '~/Downloads/%(chapter)s/%(autonumber)03d-%(title)s.%(ext)s' https://<course>.teachable.com/courses/enrolled/<course-id>

installation done by downloading the binary directly, and installing python with apt

sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
dirkf commented 2 years ago

The login procedure now invokes recaptcha.net which yt-dl can't handle at present (contributions welcome). Therefore any attempt to get Teachable pages working should use --cookies ... as above.

pedrosimao commented 1 year ago

The solution with cookies file don't work for me:

youtube-dl -v --cookies cookies.txt -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best' -o '~/Downloads/%(chapter)s/%(autonumber)03d-%(title)s.%(ext)s'  https://formation-fiscalite.teachable.com/courses/992239/lectures/22497951
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', '--cookies', 'cookies.txt', '-f', 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best', '-o', '~/Downloads/%(chapter)s/%(autonumber)03d-%(title)s.%(ext)s', 'https://formation-fiscalite.teachable.com/courses/992239/lectures/22497951']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.9.7 (CPython) - macOS-10.16-x86_64-i386-64bit
[debug] exe versions: ffmpeg 5.1.2, ffprobe 5.1.2
[debug] Proxy map: {}
[generic] 22497951: Requesting header
WARNING: Falling back on generic information extractor.
[generic] 22497951: Downloading webpage
[generic] 22497951: Extracting information
[Teachable] 22497951: Downloading webpage
ERROR: Lecture contents locked. Use --username and --password or --netrc to provide account credentials.
Traceback (most recent call last):
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 836, in __extract_info
    ie_result = ie.extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/teachable.py", line 175, in _real_extract
    self.raise_login_required('Lecture contents locked')
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 941, in raise_login_required
    raise ExtractorError(
youtube_dl.utils.ExtractorError: Lecture contents locked. Use --username and --password or --netrc to provide account credentials.
pedrosimao commented 1 year ago

I have seen this project and it seems the new Captcha is really well done: https://github.com/merberich/teachable_dl

dirkf commented 1 year ago

?? Aren't they just using cookies like yt-dl? Apart from specific parsing differences (I haven't checked), the difference seems to be BeautifulSoup/Requests vs regex/urllib2.

It is possible that the simple-minded check for a locked lecture is failing:

            if any(re.search(p, webpage) for p in (
                    r'class=["\']lecture-contents-locked',
                    r'>\s*Lecture contents locked',
                    r'id=["\']lecture-locked',
                    # https://academy.tailoredtutors.co.uk/courses/108779/lectures/1955313
                    r'class=["\'](?:inner-)?lesson-locked',
                    r'>LESSON LOCKED<')):
                self.raise_login_required('Lecture contents locked')

For instance, maybe an element with id lecture-locked is always in a protected page but only displayed when the login check fails.