yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader
https://discord.gg/H5MNcFW63r
The Unlicense
89.14k stars 6.91k forks source link

[Broken] Oreilly download support not working anymore and the playlist items give 404 #586

Closed MKSherbini closed 3 years ago

MKSherbini commented 3 years ago

Checklist

Verbose log

D:\Utils\YoutubeDL\DownloadOreilly>python "D:\Utils/YoutubeDL/ytdlsrc/yt-dlp/yt_dlp/__main__.py" --config-location "D:\Utils/YoutubeDL/OreillyConf/youtube-dl.conf" --playlist-start 1 --download-archive archive.txt --cookies cookies.txt
[debug] Custom config file: D:\Utils/YoutubeDL/OreillyConf/youtube-dl.conf
[debug] Custom config: ['-u', 'PRIVATE', '-p', 'PRIVATE', '-2', '-i', '-c', '-v', '--no-warnings', '--console-title', '--batch-file=batch-file.txt', '--write-annotations', '--write-description', '--write-info-json', '--write-thumbnail', '--sub-lang', 'en', '--write-auto-sub', '--write-sub', '--add-metadata', '--embed-thumbnail', '-o', '%(playlist_title)s/%(playlist_index)s. %(title)s.%(ext)s', '-f', 'bestvideo[height<=720]+bestaudio/best[height<=720]/worst', '--merge-output-format', 'mp4', '--mark-watched', '--geo-bypass']
[debug] Command-line config: ['--config-location', 'D:\\Utils/YoutubeDL/OreillyConf/youtube-dl.conf', '--playlist-start', '1', '--download-archive', 'archive.txt', '--cookies', 'cookies.txt']
[debug] Batch file urls: ['https://learning.oreilly.com/videos/learning-path-delivering/9781491989012/']
[debug] Loading archive file 'archive.txt'

[debug] Encodings: locale cp1252, fs utf-8, out utf-8, pref cp1252
[debug] yt-dlp version 2021.07.24 (source)
[debug] Plugin Extractors: ['SamplePlugin']
[debug] Git HEAD: 11cc45718
[debug] Python version 3.9.1 (CPython 64bit) - Windows-10-10.0.19041-SP0
[debug] exe versions: ffmpeg n4.4-4-gacb339bb88, ffprobe n4.4-4-gacb339bb88
[debug] Proxy map: {}
[safari:course] Downloading login page
[debug] [safari:course] Extracting URL: https://learning.oreilly.com/videos/learning-path-delivering/9781491989012/
[safari:course] 9781491989012: Downloading course JSON
[download] Downloading playlist: Learning Path: Delivering Applications with Docker
[info] Writing playlist metadata as JSON to: Learning Path - Delivering Applications with Docker\0. Learning Path - Delivering Applications with Docker.info.json
[safari:course] playlist Learning Path: Delivering Applications with Docker: Collected 74 videos; downloading 74 of them
[download] Downloading video 1 of 74
[safari:api] Downloading login page
[debug] [safari:api] Extracting URL: https://learning.oreilly.com/api/v1/book/9781491989012/chapter/video301824.html
[safari:api] 9781491989012/video301824: Downloading part JSON
[safari] Downloading login page
[debug] [safari] Extracting URL: https://learning.oreilly.com/library/view/learning-path-delivering/9781491989012/video301824.html
[safari] 9781491989012-video301824: Downloading webpage
ERROR: Unable to download webpage: HTTP Error 404: Not Found (caused by <HTTPError 404: 'Not Found'>); please report this issue on  https://github.com/yt-dlp/yt-dlp . Make sure you are using the latest version; see  https://github.com/yt-dlp/yt-dlp  on how to update. Be sure to call yt-dlp with the --verbose flag and include its complete output.
  File "D:\Utils\YoutubeDL\ytdlsrc\yt-dlp\yt_dlp\extractor\common.py", line 679, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "D:\Utils\YoutubeDL\ytdlsrc\yt-dlp\yt_dlp\YoutubeDL.py", line 3151, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "C:\Users\mh-sh\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 523, in open
    response = meth(req, response)
  File "C:\Users\mh-sh\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 632, in http_response
    response = self.parent.error(
  File "C:\Users\mh-sh\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 561, in error
    return self._call_chain(*args)
  File "C:\Users\mh-sh\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 494, in _call_chain
    result = func(*args)
  File "C:\Users\mh-sh\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

Description

These same configs worked just 2 days ago, now I can't download any Oreilly content, I did try to check the links, they do give 404 from a browser.

Ashish0804 commented 3 years ago

I did try to check the links, they do give 404 from a browser.

That means the links are dead. Test with a known working link.

MinePlayersPE commented 3 years ago

@Ashish0804 I think they meant the video links given by yt-dlp gave a 404. The playlist link seem to be fine

Ashish0804 commented 3 years ago

@MinePlayersPE If you have an subscription, then can u test the command given by OP so we are sure it's a broken site?

MKSherbini commented 3 years ago

The link works fine, I meant the items fetched by the link seem to give 404, the individual videos I mean. I did test multiple courses I downloaded just 2 days ago, but it does not work anymore

MKSherbini commented 3 years ago

Also to add, before I gave credentials it could fetch 1min of each video, so even in the logged-out state it could download 1min, now it can't even find the video

Ashish0804 commented 3 years ago

It's seems to be working for me

yt-dlp -F https://learning.oreilly.com/videos/learning-path-delivering/9781491989012/
[safari:course] 9781491989012: Downloading course JSON
[download] Downloading playlist: Learning Path: Delivering Applications with Docker
[safari:course] playlist Learning Path: Delivering Applications with Docker: Collected 74 videos; downloading 74 of them
[download] Downloading video 1 of 74
[safari:api] 9781491989012/video301824: Downloading part JSON
[safari] 9781491989012-video301824: Downloading webpage
[Kaltura] 9781491989012-video301824: Downloading webpage
[Kaltura] 0_wuyu5ime: Downloading video info JSON
[Kaltura] 0_wuyu5ime: Checking mp4-409 URL
[Kaltura] 0_wuyu5ime: Downloading m3u8 information
WARNING: [Kaltura] Ignoring subtitle tracks found in the HLS manifest; if any subtitle tracks are missing, please report this issue on  https://github.com/yt-dlp/yt-dlp . Make sure you are using the latest version; see  https://github.com/yt-dlp/yt-dlp  on how to update. Be sure to call yt-dlp with the --verbose flag and include its complete output.
[info] Available formats for 0_wuyu5ime:
ID      EXT RESOLUTION FPS |  FILESIZE    TBR PROTO  | VCODEC    VBR ACODEC     ABR MORE INFO
------- --- ---------- --- - ---------- ----- ------ - ------- ----- --------- ---- ---------
mp4-56  mp4 audio only 0   | ~689.00KiB   56k http   |               unknown    56k isom
hls-58  mp4 audio only     |              58k m3u8_n |               mp4a.40.2  58k 
hls-230 mp4 640x360        |             230k m3u8_n | unknown  230k unknown     0k 
hls-252 mp4 640x360        |             252k m3u8_n | unknown  252k unknown     0k 
mp4-199 mp4 640x360    29  | ~2.37MiB    199k http   | avc1     199k unknown     0k isom
mp4-218 mp4 640x360    29  | ~2.60MiB    218k http   | avc1     218k unknown     0k isom
mp4-220 mp4 640x360    29  | ~2.63MiB    220k http   | avc1     220k unknown     0k isom
hls-433 mp4 1280x720       |             433k m3u8_n | unknown  433k unknown     0k 
mp4-386 mp4 1280x720   29  | ~4.60MiB    386k http   | avc1     386k unknown     0k isom
mp4-393 mp4 1280x720   29  | ~4.68MiB    393k http   | avc1     393k unknown     0k isom
mp4-409 mp4 1280x720   29  | ~4.87MiB    409k http   | avc1     409k unknown     0k mp42
[download] Downloading video 2 of 74
[safari:api] 9781491989012/video301825: Downloading part JSON
[safari] 9781491989012-video301825: Downloading webpage
MKSherbini commented 3 years ago

@Ashish0804 that's true, I tested a bit, it seems adding the cookies file causes this 404 error, but I don't get how that could be possible when it's available for the public anyway

Ashish0804 commented 3 years ago

Try getting latest cookies from the browser (they could have expired) Also make sure u are able to view/play the files when logged in...as u mentioned u have used this before so depending on how much u were downloading, they could have blocked your account.

If the problem still isn't solved, then you will have to provide the account since i can't reproduce the issue.

MKSherbini commented 3 years ago

You are right, my browser seems to have cached something that lets me access the content still, but when trying to log in from another browser this wasn't the case, I'll solve my account's issues then, thanks.

MKSherbini commented 3 years ago

@Ashish0804 After re-checking with multiple others, I can confirm the issue is from yt-dlp, I can access the site normally and open all videos (The same for my friends), but can't use cookies anymore. As for the Oreilly account, you can create any temp account and use the free trial.

MKSherbini commented 3 years ago

@pukkandan Did creating a new temp account fail somehow? I can help debugging too just lemme know the related parts in code

hasantayyar commented 3 years ago

The extractor builds the wrong individual video link

This is what the extractor builds:

https://learning.oreilly.com/library/view/getting-started-with/9781787285491/video1_1.html

This is what it should be

https://learning.oreilly.com/videos/getting-started-with/9781787285491/9781787285491-video1_1/

image

MKSherbini commented 3 years ago

@hasantayyar Thanks, I submitted a PR to handle this change here. As I didn't read the rest of the code I can't confirm if it creates other issues, but now I can download again without issues.

MKSherbini commented 3 years ago

After some testing, this just bypassed the 404-issue, even after using the right URL, it only downloads 3min from each video as if not authenticated

MKSherbini commented 3 years ago
[debug] Custom config: ['--cookies-from-browser', 'firefox', '--download-archive', 'archive.txt', '-i', '-c', '-v', '--console-title', '--batch-file=batch-file.txt', '--write-annotations', '--write-description', '--write-info-json', '--write-thumbnail', '--sub-lang', 'en', '--write-auto-sub', '--write-sub', '--add-metadata', '--embed-subs', '--embed-thumbnail', '-o', '%(playlist_title)s/%(playlist_index)s. %(title)s.%(ext)s', '-f', 'bestvideo[height<=720]+bestaudio/best[height<=720]/worst', '--merge-output-format', 'mp4']
[debug] Command-line config: ['--config-location', 'D:\\Utils/YoutubeDL/Configs/ytdlp_oreilly.conf', '--playlist-items', '2']
[debug] Batch file urls: ['https://learning.oreilly.com/videos/the-principles-of/9781491935811/']
[Cookies] Extracting cookies from firefox
[debug] Extracting cookies from: "C:\Users\mh-sh\AppData\Roaming\Mozilla\Firefox\Profiles\8tceuger.default-release\cookies.sqlite"
[Cookies] Extracted 1548 cookies from firefox
[debug] Loading archive file 'archive.txt'

[debug] Encodings: locale cp1252, fs utf-8, out utf-8, pref cp1252
[debug] yt-dlp version 2021.09.02 (source)
[debug] Plugin Extractors: ['SamplePlugin']
[debug] Git HEAD: 982323fe1
[debug] Python version 3.9.1 (CPython 64bit) - Windows-10-10.0.19041-SP0
[debug] exe versions: ffmpeg n4.4-4-gacb339bb88, ffprobe n4.4-4-gacb339bb88
[debug] Optional libraries: sqlite
[debug] Proxy map: {}
[debug] [safari:course] Extracting URL: https://learning.oreilly.com/videos/the-principles-of/9781491935811/
[safari:course] 9781491935811: Downloading course JSON
[download] Downloading playlist: The Principles of Microservices
[info] Writing playlist metadata as JSON to: The Principles of Microservices\0. The Principles of Microservices.info.json
WARNING: There's no playlist description to write.
[safari:course] playlist The Principles of Microservices: Collected 15 videos; downloading 1 of them
[download] Downloading video 1 of 1
[debug] [safari:api] Extracting URL: https://learning.oreilly.com/api/v1/book/9781491935811/chapter/video221406.html
[safari:api] 9781491935811/video221406: Downloading part JSON
[debug] [safari] Extracting URL: https://learning.oreilly.com/videos/the-principles-of/9781491935811/9781491935811-video221406/
[debug] [Kaltura] Extracting URL: https://cdnapisec.kaltura.com/html5/html5lib/v2.37.1/mwEmbedFrame.php?wid=_1926081&uiconf_id=29375172&flashvars%5BreferenceId%5D=9781491935811-video221406
[Kaltura] 9781491935811-video221406: Downloading webpage
[Kaltura] 0_1i6jb4o4: Downloading video info JSON
[Kaltura] 0_1i6jb4o4: Checking mp4-4468 URL
[Kaltura] 0_1i6jb4o4: Downloading m3u8 information
WARNING: [Kaltura] Ignoring subtitle tracks found in the HLS manifest; if any subtitle tracks are missing, please report this issue on  https://github.com/yt-dlp/yt-dlp . Make sure you are using the latest version; see  https://github.com/yt-dlp/yt-dlp  on how to update. Be sure to call yt-dlp with the --verbose flag and include its complete output.
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, vcodec:vp9.2(10), acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[debug] Downloading subtitles: en
[info] 0_1i6jb4o4: Downloading 1 format(s): mp4-1966
WARNING: There's no description to write.
WARNING: There are no annotations to write.
[info] Writing video subtitles to: The Principles of Microservices\2. What are Microservices.en.ttml
[debug] Invoking downloader on "http://cdnapi.kaltura.com/api_v3/service/caption_captionasset/action/serve/captionAssetId/0_diyhxwtn"
[download] The Principles of Microservices\2. What are Microservices.en.ttml has already been downloaded
[download] 100% of 15.08KiB
[info] Writing video metadata as JSON to: The Principles of Microservices\2. What are Microservices.info.json
[Kaltura] 0_1i6jb4o4: Downloading thumbnail  ...
[Kaltura] 0_1i6jb4o4: Writing thumbnail to: The Principles of Microservices\2. What are Microservices.jpg
[debug] Invoking downloader on "http://cdnapi.kaltura.com/p/1926081/sp/192608100/playManifest/entryId/0_1i6jb4o4/format/url/protocol/http/flavorId/0_coc8hy0d"
[download] Resuming download at byte 2717283
[download] Destination: The Principles of Microservices\2. What are Microservices.mp4
[download]   7.6% of 35.87MiB at 489.40KiB/s ETA 01:09

The issue now is that the downloader is invoked on "http://cdnapi.kaltura.com/p/1926081/sp/192608100/playManifest/entryId/0_1i6jb4o4/format/url/protocol/http/flavorId/0_coc8hy0d" which is only 3min, but it already had access to the full video at "https://cdnapisec.kaltura.com/html5/html5lib/v2.37.1/mwEmbedFrame.php?wid=_1926081&uiconf_id=29375172&flashvars%5BreferenceId%5D=9781491935811-video221406"

hasantayyar commented 3 years ago

@hasantayyar Thanks, I submitted a PR to handle this change here. As I didn't read the rest of the code I can't confirm if it creates other issues, but now I can download again without issues.

Thanks @MKSherbini I will test with a subscription.

The issue is the O'Reilly api responds the wrong web url and I think your change is the only way to fix it for now until they changed.

image

grepmeister commented 3 years ago

I just installed with python3 -m pip install --upgrade git+https://github.com/MKSherbini/yt-dlp and was able to download again with a free trial account.

~~BUT all videos are truncated after 60 seconds.' When using the same account in their webplayer it's possible to watch the videos beyond the 60s~~

Update: I think the trunkated videos happened because I still was using an (old) cookie.txt and a useragent option taken from earlier tries to get yt-dlp and safari working. I am sorry for the confusion.

I now can confirm that MKSherbini is working for me, even with a trial account! Thanks MKSherbini!

hasantayyar commented 3 years ago

@MKSherbini I tested this with my credentials both with individual video page and course page. It's downloading the videos without issues

pukkandan commented 3 years ago

I am confused by this conversation. Does #990 download the truncated video, or the full video?

grepmeister commented 3 years ago

pukkandan, I am sorry for the confusion, as far as I can tell now the fix works just fine! I updated my original comment https://github.com/yt-dlp/yt-dlp/issues/586#issuecomment-921203603-permalink

santhosh-v commented 3 years ago

With latest update getting this issue.

[safari:course] 9780136787709: Downloading course JSON
[download] Downloading playlist: Getting Started with Kubernetes LiveLessons, 2nd Edition
[safari:course] playlist Getting Started with Kubernetes LiveLessons, 2nd Edition: Collected 87 videos; downloading 87 of them
[download] Downloading video 1 of 87
[safari:api] 9780136787709/GSK2_00_00_00: Downloading part JSON
ERROR: no suitable InfoExtractor for URL https:/9780136787709-GSK2_00_00_00
[download] Downloading video 2 of 87
[safari:api] 9780136787709/GSK2_01_01_00: Downloading part JSON
ERROR: no suitable InfoExtractor for URL https:/9780136787709-GSK2_01_01_00
[download] Downloading video 3 of 87
[safari:api] 9780136787709/GSK2_01_01_01: Downloading part JSON
ERROR: no suitable InfoExtractor for URL https:/9780136787709-GSK2_01_01_01
[download] Downloading video 4 of 87
[safari:api] 9780136787709/GSK2_01_01_02: Downloading part JSON
ERROR: no suitable InfoExtractor for URL https:/9780136787709-GSK2_01_01_02
[download] Downloading video 5 of 87
[safari:api] 9780136787709/GSK2_01_01_03: Downloading part JSON
pukkandan commented 3 years ago

I see what the issue is. Thanks for the catch. Will fix it when Iget on my PC