ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.31k stars 10.03k forks source link

Unable to download certain courses from egghead.io #30400

Open 0x7145 opened 2 years ago

0x7145 commented 2 years ago

Checklist

Verbose log

$ youtube-dl -v 'https://egghead.io/courses/refactor-react-components-to-typescript-c70bffa0'

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', 'https://egghead.io/courses/refactor-react-components-to-typescript-c70bffa0']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.9.9 (CPython) - macOS-10.14.6-x86_64-i386-64bit
[debug] exe versions: ffmpeg 4.4.1, ffprobe 4.4.1
[debug] Proxy map: {}
[egghead:course] refactor-react-components-to-typescript-c70bffa0: Downloading course lessons JSON
ERROR: Unable to download JSON metadata: HTTP Error 404: Not Found (caused by <HTTPError 404: 'Not Found'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/youtube_dl/extractor/common.py", line 634, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 2288, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 561, in error
    return self._call_chain(*args)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

Description

I came across some Egghead.io courses that can't be downloaded. The courses in question are free during this month (December 2021), after which they will require a paid account.

ghost commented 2 years ago

How about this URL https://egghead.io/lessons/react-refactor-a-react-application-to-typescript at the right side of the page? Edit: this was only an introduction, sorry for the noise.

$ youtube-dl https://egghead.io/lessons/react-refactor-a-react-application-to-typescript -F
[egghead:lesson] react-refactor-a-react-application-to-typescript: Downloading lesson JSON
[egghead:lesson] 9113: Downloading m3u8 information
[egghead:lesson] 9113: Downloading MPD manifest
[info] Available formats for 9113:
format code  extension  resolution note
dash-3       m4a        audio only [eng] DASH audio   32k , m4a_dash container, mp4a.40.2 (44100Hz)
dash-4       m4a        audio only [eng] DASH audio   64k , m4a_dash container, mp4a.40.2 (44100Hz)
dash-2       mp4        1280x720   DASH video 3228k , mp4_dash container, avc1.64001F, 30fps, video only
dash-1       mp4        1920x1080  DASH video 7486k , mp4_dash container, avc1.640032, 30fps, video only
hls-5068     mp4        1280x720   5068k , avc1.640028, mp4a.40.5
hls-9924     mp4        1920x1080  9924k , avc1.640028, mp4a.40.5 (best)
dirkf commented 2 years ago

Currently the Egghead extractor can't download this course, because it expects to get metadata from api.egghead.io that is not being returned (404).

This patch enables the lessons (videos) to be found from the course (playlist):

--- old/youtube-dl/youtube_dl/extractor/egghead.py
+++ new/youtube-dl/youtube_dl/extractor/egghead.py
@@ -1,5 +1,7 @@
 # coding: utf-8
 from __future__ import unicode_literals
+
+import re

 from .common import InfoExtractor
 from ..compat import compat_str
@@ -7,8 +9,10 @@
     determine_ext,
     int_or_none,
     try_get,
+    str_or_none,
     unified_timestamp,
     url_or_none,
+    urljoin,
 )

@@ -40,25 +44,28 @@
         playlist_id = self._match_id(url)
         series_path = 'series/' + playlist_id
         lessons = self._call_api(
-            series_path + '/lessons', playlist_id, 'course lessons')
+            series_path + '/lessons', playlist_id, 'course lessons', fatal=False)
+
+        # same info is actually in the page
+        if not lessons:
+            webpage = self._download_webpage(url, playlist_id)
+            lessons = re.finditer(r'<a\s[^>]*?\bhref\s*=\s*(?P<q>"|\'|\b)(?P<http_url>/lessons/(?P<id>[\w-]+))(?P=q)', webpage) or []
+            lessons = map(lambda x: x.groupdict(), lessons)

         entries = []
         for lesson in lessons:
-            lesson_url = url_or_none(lesson.get('http_url'))
+            lesson_url = str_or_none(lesson.get('http_url'))
             if not lesson_url:
                 continue
+            lesson_url = url_or_none(urljoin(url, lesson_url))
-            lesson_id = lesson.get('id')
-            if lesson_id:
-                lesson_id = compat_str(lesson_id)
+            lesson_id = str_or_none(lesson.get('id'))
             entries.append(self.url_result(
                 lesson_url, ie=EggheadLessonIE.ie_key(), video_id=lesson_id))

         course = self._call_api(
             series_path, playlist_id, 'course', False) or {}

-        playlist_id = course.get('id')
-        if playlist_id:
-            playlist_id = compat_str(playlist_id)
+        playlist_id = str_or_none(course.get('id'))

         return self.playlist_result(
             entries, playlist_id, course.get('title'),
0x7145 commented 2 years ago

Thanks for your help. Finally, what I did was manually retrieve the lessons from the course that had the problem and save the URLs in a file to proceed with the download:

let urls = [];
document.querySelectorAll('section.mt-8 a').forEach(anchor => urls.push(anchor.href));
copy(urls.join('\n'));
youtube-dl -a lessons.txt -i -o "%(autonumber)02d. %(title)s.%(ext)s" --write-description --write-annotations

So yes, I can download the lessons if I know their URLs (https://egghead.io/lessons/...), but it's not always possible to download them using the URL of the course they belong to (https://egghead.io/courses/...).

a-eid commented 2 years ago

@dirkf @panorvma I'm getting this error when trying to download a certain course.

[egghead:lesson] react-native-create-a-button-in-react-native-using-touchableopacity: Downloading lesson JSON
ERROR: An extractor error has occurred. (caused by KeyError('media_urls')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
[egghead:lesson] react-native-customize-button-pressed-state-using-the-react-native-v0-63-pressable-component: Downloading lesson JSON
ERROR: An extractor error has occurred. (caused by KeyError('media_urls')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
dirkf commented 2 years ago

How interesting. If you had provided a verbose log it might have been possible to diagnose the issue.

a-eid commented 2 years ago

@dirkf

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-a', 'lessons.txt', '-i', '-o', '%(autonumber)02d. %(title)s.%(ext)s', '--write-description', '--write-annotations', '--verbose']
[debug] Batch file urls: ['https://egghead.io/lessons/react-native-create-a-button-in-react-native-using-the-button-component', 'https://egghead.io/lessons/react-native-create-a-button-in-react-native-using-touchableopacity', 'https://egghead.io/lessons/react-native-customize-button-pressed-state-using-the-react-native-v0-63-pressable-component']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.10.1 (CPython) - macOS-11.6.1-x86_64-i386-64bit
[debug] exe versions: ffmpeg 4.4.1, ffprobe 4.4.1
[debug] Proxy map: {}
[egghead:lesson] react-native-create-a-button-in-react-native-using-the-button-component: Downloading lesson JSON
[egghead:lesson] 7969: Downloading m3u8 information
[egghead:lesson] 7969: Downloading MPD manifest
[debug] Default format spec: bestvideo+bestaudio/best
[info] Writing video description to: 01. Create a Button in React Native Using the Button Component.description
WARNING: There are no annotations to write.
[download] 01. Create a Button in React Native Using the Button Component.mp4 has already been downloaded and merged
[egghead:lesson] react-native-create-a-button-in-react-native-using-touchableopacity: Downloading lesson JSON
ERROR: An extractor error has occurred. (caused by KeyError('media_urls')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/usr/local/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.10/site-packages/youtube_dl/extractor/common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.10/site-packages/youtube_dl/extractor/egghead.py", line 109, in _real_extract
    for _, format_url in lesson['media_urls'].items():
KeyError: 'media_urls'
Traceback (most recent call last):
  File "/usr/local/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.10/site-packages/youtube_dl/extractor/common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.10/site-packages/youtube_dl/extractor/egghead.py", line 109, in _real_extract
    for _, format_url in lesson['media_urls'].items():
KeyError: 'media_urls'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.10/site-packages/youtube_dl/YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.10/site-packages/youtube_dl/YoutubeDL.py", line 836, in __extract_info
    ie_result = ie.extract(url)
  File "/usr/local/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.10/site-packages/youtube_dl/extractor/common.py", line 547, in extract
    raise ExtractorError('An extractor error has occurred.', cause=e)
youtube_dl.utils.ExtractorError: An extractor error has occurred. (caused by KeyError('media_urls')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

[egghead:lesson] react-native-customize-button-pressed-state-using-the-react-native-v0-63-pressable-component: Downloading lesson JSON
ERROR: An extractor error has occurred. (caused by KeyError('media_urls')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/usr/local/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.10/site-packages/youtube_dl/extractor/common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.10/site-packages/youtube_dl/extractor/egghead.py", line 109, in _real_extract
    for _, format_url in lesson['media_urls'].items():
KeyError: 'media_urls'
Traceback (most recent call last):
  File "/usr/local/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.10/site-packages/youtube_dl/extractor/common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.10/site-packages/youtube_dl/extractor/egghead.py", line 109, in _real_extract
    for _, format_url in lesson['media_urls'].items():
KeyError: 'media_urls'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.10/site-packages/youtube_dl/YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.10/site-packages/youtube_dl/YoutubeDL.py", line 836, in __extract_info
    ie_result = ie.extract(url)
  File "/usr/local/Cellar/youtube-dl/2021.12.17/libexec/lib/python3.10/site-packages/youtube_dl/extractor/common.py", line 547, in extract
    raise ExtractorError('An extractor error has occurred.', cause=e)
youtube_dl.utils.ExtractorError: An extractor error has occurred. (caused by KeyError('media_urls')); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
dirkf commented 2 years ago

That lesson seems to have no video, only images and text. What happens when you load it in the browser?

a-eid commented 2 years ago

@dirkf yeah my bad, the course seems to be a pro course and woudn't work without me logging in.