ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.41k stars 10.04k forks source link

oreilly.com using library card #30405

Closed blueray453 closed 2 years ago

blueray453 commented 2 years ago

Checklist

Example URLs

Description

You can access these resources if you have a https://www.hcpl.net/ library card.

After you have the library card , you can just go to https://www.hcpl.net/database/oreilly-public-libraries-formerly-safari-books-online and click "Go To Resource"

I am mainly interested to download the videos using cookie file. Please let me know how to send the cookie file.

dirkf commented 2 years ago

Read the Fine Manual.

blueray453 commented 2 years ago

The link is https://learning-oreilly-com.hcpl.idm.oclc.org not https://learning.oreilly.com/

I tried youtube-dl --ignore-config -i -c --no-warnings --console-title --batch-file='batch-file.txt' --cookies cookies.txt -o '%(playlist_title)s/%(playlist_index)s-%(title)s.%(ext)s' -f 'best[height<=540]'.

It says:

[generic] 9780135560501: Requesting header
[generic] 9780135560501: Downloading webpage
[generic] 9780135560501: Extracting information
ERROR: Unsupported URL: https://learning-oreilly-com.hcpl.idm.oclc.org/videos/ansible-certification-red/9780135560501/
dirkf commented 2 years ago

This patch should give access to the media in the pages with the quoted URLs (also back-port from yt-dlp):

--- old/youtube-dl/youtube_dl/extractor/safari.py
+++ new/youtube-dl/youtube_dl/extractor/safari.py
@@ -86,8 +86,11 @@
     IE_NAME = 'safari'
     IE_DESC = 'safaribooksonline.com online video'
     _VALID_URL = r'''(?x)
-                        https?://
-                            (?:www\.)?(?:safaribooksonline|(?:learning\.)?oreilly)\.com/
+                        https?://(?:www\.)?
+                            (?:
+                                (?:safaribooksonline|(?:learning\.)?oreilly)\.com|
+                                learning-oreilly-com\.hcpl\.idm\.oclc\.org
+                            )/
                             (?:
                                 library/view/[^/]+/(?P<course_id>[^/]+)/(?P<part>[^/?\#&]+)\.html|
                                 videos/[^/]+/[^/]+/(?P<reference_id>[^-]+-[^/?\#&]+)
@@ -193,17 +196,24 @@
         part = self._download_json(
             url, '%s/%s' % (mobj.group('course_id'), mobj.group('part')),
             'Downloading part JSON')
-        return self.url_result(part['web_url'], SafariIE.ie_key())
+        web_url = part['web_url']
+        if 'library/view' in web_url:
+            web_url = web_url.replace('library/view', 'videos')
+            natural_keys = part['natural_key']
+            web_url = '{0}/{1}-{2}'.format(web_url.rsplit("/", 1)[0], natural_keys[0], natural_keys[1][:-5])
+        return self.url_result(web_url, SafariIE.ie_key())

 class SafariCourseIE(SafariBaseIE):
     IE_NAME = 'safari:course'
     IE_DESC = 'safaribooksonline.com online courses'
-
     _VALID_URL = r'''(?x)
-                    https?://
+                    https?://(?:www\.)?
                         (?:
-                            (?:www\.)?(?:safaribooksonline|(?:learning\.)?oreilly)\.com/
+                            (?:
+                                (?:safaribooksonline|(?:learning\.)?oreilly)\.com|
+                                learning-oreilly-com\.hcpl\.idm\.oclc\.org
+                            )/
                             (?:
                                 library/view/[^/]+|
                                 api/v1/book|
blueray453 commented 2 years ago

Thank you very much.

blueray453 commented 2 years ago

@dirkf From Adding support for a new site , I figured out I can use python test/test_download.py TestDownload.test_YourExtractor to run tests. What exact command do I need to test the safari extractor?

dirkf commented 2 years ago

With the top-level development directory (the one containing test) as current, python test/test_download.py TestDownload.test_YourExtractor tests the first (zeroth) test for YourExtractorIE, and add _n for the nth test. So:

python test/test_download.py TestDownload.test_Safari
python test/test_download.py TestDownload.test_Safari_1
...
python test/test_download.py TestDownload.test_SafariAPI
python test/test_download.py TestDownload.test_SafariAPI_1
...
python test/test_download.py TestDownload.test_SafariCourse
python test/test_download.py TestDownload.test_SafariCourse_1
...

To test your problem URLs, create entries in the TESTS list for the appropriate extractor classes, or just supply the options and url in:

python -m youtube_dl options url
blueray453 commented 2 years ago

@dirkf Thank you very much.