ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.77k stars 10.08k forks source link

[req] egghead.io #6635

Open ralyodio opened 9 years ago

ralyodio commented 9 years ago

technology list: https://egghead.io/technologies/angular2 series list: https://egghead.io/series/react-flux-architecture search results list: https://egghead.io/search?q=testing

video page: https://egghead.io/lessons/react-development-environment-setup

nodu commented 9 years ago

+1 Especially for the Pro videos, -u and -p don't seem to be working

TRox1972 commented 8 years ago

The single video page provided seems to be a Wistia-embed, which is supported. Search and playlists for the site does not seem to work as of now, though.

ralyodio commented 8 years ago

Here's an option... https://github.com/SimonSelg/egghead-downloader

ralyodio commented 7 years ago

seems to work for free courses, but how do we login to egghead for pro courses?

rizafahmi commented 7 years ago

I cannot get courses one. I tried with https://egghead.io/courses/asynchronous-javascript-with-async-await but got ERROR: Unable to extract title. Used version 2017.05.01.

shatyajeet commented 7 years ago

The regex for extracting title in version 2017.05.14 is r'<h1 class="title">([^<]+)</h1>'. But the latest update to the website mentions the title in a <span>...</span>.

alexrussell commented 7 years ago

Yes - the egghead extractor needs updating to the new way the site works.

Interestingly, the course pages seem to embed a JSON representation of the lessons (it's actually JSON embedded in a script tag with its type set to application/json in order, seemingly, to hydrate/prime a React component when the page loads), whose format is below (I have beautified the JSON, removed any array repetition and remove keys that aren't overly relevant):

<script type="application/json" class="js-react-on-rails-component">
{
  "component_name": "CourseApp",
  "props": {
    "course": {
      "id": 115,
      "duration": 2073,
      "title": "Maintainable CSS using TypeStyle",
      "slug": "maintainable-css-using-typestyle",
      "http_url": "https://egghead.io/courses/maintainable-css-using-typestyle",
      "url": "https://egghead.io/api/v1/series/maintainable-css-using-typestyle",
      "lessons": [{
        "id": 2050,
        "title": "Add type safety to CSS using TypeStyle",
        "slug": "css-add-type-safety-to-css-using-typestyle",
        "duration": 253,
        "series_row_order": -2097151,
        "http_url": "https://egghead.io/lessons/css-add-type-safety-to-css-using-typestyle",
        "url": "https://egghead.io/api/v1/lessons/css-add-type-safety-to-css-using-typestyle",
        "lesson_http_url": "https://egghead.io/lessons/css-add-type-safety-to-css-using-typestyle"
      }]
    }
  }
}
</script>

This JSON looks pretty much like the output of the public API for the course, found at https://egghead.io/api/v1/series/maintainable-css-using-typestyle.

So presumably for a given course the egghead extractor could use this public API - the lesson API responses even include the wistia ID to potentially save loading the HTML to extract it.

But even if youtube-dl doesn't like to rely on the actual API itself, it can certainly scrape the page as normal but use this JSON instead of the more likely-to-change HTML to get the title as well as the references to the individual lesson pages. Just a suggestion.

santicalcagno commented 7 years ago

I tried to tackle this and I ended up using the public API, referring to the wistia ID's of each lesson. Check out the PR below :)

(This is my first PR and it's been a while since I coded something in Python, so sorry in advance if I get anything wrong.)

alexrussell commented 7 years ago

@santicalcagno looks good enough to me (though I haven't tested it!)

mkbs700 commented 7 years ago

Egghead support seems to be broken now.

Tried to run youtube-dl https://egghead.io/lessons/javascript-create-a-native-desktop-system-menu-with-the-electron-menu-module to test out but encountered the error messages as below:

[generic] javascript-create-a-native-desktop-system-menu-with-the-electron-menu-module: Requesting header
WARNING: Falling back on generic information extractor.
[generic] javascript-create-a-native-desktop-system-menu-with-the-electron-menu-module: Downloading webpage
[generic] javascript-create-a-native-desktop-system-menu-with-the-electron-menu-module: Extracting information
ERROR: Unsupported URL: https://egghead.io/lessons/javascript-create-a-native-desktop-system-menu-with-the-electron-menu-module
alexrussell commented 7 years ago

I agree that the lesson pages don't work, but FWIW the course pages do.

But yes, the lesson pages could still do with being fixed.

santicalcagno commented 7 years ago

Yup, it's pretty straightforward to fix given the logic used for courses. Theoretically, defining a new extractor refering to the wistia ID exposed, for example, in https://egghead.io/api/v1/lessons/javascript-create-a-native-desktop-system-menu-with-the-electron-menu-module should be enough.

I'm kinda busy ATM, so if anyone wants to give this a go, by all means do so. Otherwise I should be taking this a look in a couple of weeks or so.

rexpan commented 7 years ago
> youtube-dl --version
2017.09.24

> youtube-dl --verbose https://egghead.io/lessons/react-error-handling-using-error-boundaries-in-react-16
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'https://egghead.io/lessons/react-error-handling-using-error-boundaries-in-react-16']
[debug] Encodings: locale cp1252, fs mbcs, out cp437, pref cp1252
[debug] youtube-dl version 2017.09.24
[debug] Python version 3.4.4 - Windows-10-10.0.15063
[debug] exe versions: none
[debug] Proxy map: {}
[egghead:lesson] react-error-handling-using-error-boundaries-in-react-16: Downloading JSON metadata
ERROR: An extractor error has occurred. (caused by KeyError('wistia_id',)); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp1uop9avr\build\youtube_dl\extractor\common.py", line 434, in extract
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp1uop9avr\build\youtube_dl\extractor\egghead.py", line 75, in _real_extract
KeyError: 'wistia_id'
Traceback (most recent call last):
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp1uop9avr\build\youtube_dl\extractor\common.py", line 434, in extract
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp1uop9avr\build\youtube_dl\extractor\egghead.py", line 75, in _real_extract
KeyError: 'wistia_id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp1uop9avr\build\youtube_dl\YoutubeDL.py", line 777, in extract_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp1uop9avr\build\youtube_dl\extractor\common.py", line 447, in extract
youtube_dl.utils.ExtractorError: An extractor error has occurred. (caused by KeyError('wistia_id',)); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
mk-pmb commented 7 years ago

I'm on it. I found the dash and m3u8 URLs, now all I've left is check how I can redirect ytdl to use them. Update: Current stage:

ERROR: no suitable InfoExtractor for URL dash:https://█████.cloudfront.net/javascript-redux-the-single-immutable-state-tree-█████/javascript-redux-the-single-immutable-state-tree-█████.mpd ERROR: no suitable InfoExtractor for URL m3u8:https://█████.cloudfront.net/javascript-redux-the-single-immutable-state-tree-█████/javascript-redux-the-single-immutable-state-tree-█████.m3u8

hackuun commented 7 years ago

I have the same issue. Can't download from Eggghead

ERROR: An extractor error has occurred. (caused by KeyError(u'wistia_id',))
smkamranqadri commented 7 years ago

Stilling getting the issue.

ERROR: An extractor error has occurred. (caused by KeyError(u'wistia_id',)); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

MichaelDeBoey commented 7 years ago

As @joelhooks said in #14388 (comment): @eggheadio isn't using Wistia for streaming any longer.

So this will be solved by #14388 I guess 🙂

hackuun commented 7 years ago

Still not working. Will it be fixed?

MichaelDeBoey commented 7 years ago

@iamdubx #14388 still isn't merged so... 🙂

smkamranqadri commented 7 years ago

Still same issue.

Muhammads-MacBook-Pro:Videos mkamran$ sh download.sh create-a-news-app-with-vue-js-and-nuxt [egghead:course] create-a-news-app-with-vue-js-and-nuxt: Downloading JSON metadata ERROR: An extractor error has occurred. (caused by KeyError(u'lessons',)); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. Muhammads-MacBook-Pro:Videos mkamran$ cat download2.sh youtube-dl --download-archive "$1/archive.txt" -o "$1/%(playlistindex)s%(title)s" "https://egghead.io/lessons/$1"

Muhammads-MacBook-Pro:Videos mkamran$ Start a Nuxt Project with npx and the Vue.js CLI Muhammads-MacBook-Pro:Videos mkamran$ sh download2.sh start-a-nuxt-project-with-npx-and-the-vue-js-cli [egghead:lesson] start-a-nuxt-project-with-npx-and-the-vue-js-cli: Downloading JSON metadata ERROR: An extractor error has occurred. (caused by KeyError(u'wistia_id',)); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. Muhammads-MacBook-Pro:Videos mkamran$

MichaelDeBoey commented 7 years ago

@smkamranqadri #14388 still isn't merged so... 🙂

smkamranqadri commented 7 years ago

@MichaelDeBoey how can use that code?

MichaelDeBoey commented 7 years ago

@smkamranqadri: @mk-pmb linked to his PR branch 🙂

edyionescu commented 7 years ago

That works, but can't get it to download the best video format.

smkamranqadri commented 7 years ago

thanks but I am not python expert so don't know what to do next after cloning?

hackuun commented 7 years ago

If I understand right - guys from Egghead ask guys from youtube-dl not to fix this :smile:

edyionescu commented 7 years ago

@smkamranqadri, cc @mk-pmb

git checkout egghead-mediaurls-171002

then run

python -m youtube_dl https://egghead.io/lessons/react-add-redux-to-a-react-application

However, this doesn't download the best available quality.

python -m youtube_dl -F https://egghead.io/lessons/react-add-redux-to-a-react-application

[info] Available formats for react-add-redux-to-a-react-application:
format code                           extension  resolution note
ef6e36a9-2384-45cb-901d-c827483e0fd3  mp4        1280x720   DASH video 2400k , avc1.64001f, 25fps, video only
c0f2426b-642a-47ef-bccd-c628a0db8ee4  mp4        854x480    DASH video 1200k , avc1.64001e, 25fps, video only
b1f6d4b4-614a-4745-8f06-326f7cb49f53  m4a        audio only [en] DASH audio  128k , mp4a.40.2 (48000Hz) (best)

Specifying the width of the video throws a request format error.

python -m youtube_dl -f '[width=1280]' https://egghead.io/lessons/react-add-redux-to-a-react-application 

[egghead:lesson] react-add-redux-to-a-react-application: Downloading MPD manifest
ERROR: requested format not available
ghost commented 7 years ago

https://egghead.io/lessons/react-add-redux-to-a-react-application

youtube-dl "https://d2c5owlt6rorc3.cloudfront.net/react-add-redux-to-a-react-application-ed6daaa8cb/react-add-redux-to-a-react-application-ed6daaa8cb.m3u8" -o react-app.mp4
[generic] react-add-redux-to-a-react-application-ed6daaa8cb: Requesting header
[generic] react-add-redux-to-a-react-application-ed6daaa8cb: Downloading m3u8 information
[download] Destination: react-app.f560.mp4
[...]
[ffmpeg] Downloaded 14086932 bytes
[download] 100% of 13.43MiB
[download] Destination: react-app.faudio_group-react-add-redux-to-a-react-application.mp4
[...]
[ffmpeg] Downloaded 4137919 bytes
[download] 100% of 3.95MiB
[ffmpeg] Merging formats into "react-app.mp4"
Deleting original file react-app.f560.mp4 (pass -k to keep)
Deleting original file react-app.faudio_group-react-add-redux-to-a-react-application.mp4 (pass -k to keep)
Video: MPEG4 Video (H264) 1280x720 25fps 429kbps [V: h264 high L3.1, yuv420p, 1280x720, 429 kb/s]
Audio: AAC 48000Hz stereo 125kbps [A: SoundHandler (aac lc, 48000 Hz, stereo, 125 kb/s)]
hackuun commented 7 years ago

To tell the truth - I don't understand why Egghead are fighting so hard so nobody can download they videos. Today I will torrent all available Egghead courses on Rutracker, and I have most of them.

edyionescu commented 7 years ago

@0880 Yep, that's it. Thanks!

smkamranqadri commented 7 years ago

@errorsmith @0880

ERROR: requested format not available

ghost commented 7 years ago

Extraction fixed in latest version.

youtube-dl "https://egghead.io/lessons/react-error-handling-using-error-boundaries-in-react-16"
[egghead:lesson] react-error-handling-using-error-boundaries-in-react-16: Downloading JSON metadata
[egghead:lesson] 2464: Downloading MPD manifest
[egghead:lesson] 2464: Downloading m3u8 information
[dashsegments] Total fragments: 92
[download] Destination: Error Handling using Error Boundaries in React 16-2464.fdash-63abafa9-2580-4fd5-9a73-511de5dca9b8.mp4
[download] 100% of 28.24MiB in 02:12
[dashsegments] Total fragments: 92
[download] Destination: Error Handling using Error Boundaries in React 16-2464.fdash-34043f9a-4e56-4683-bb70-027dd42b37cf.m4a
[download] 100% of 5.48MiB in 01:21
[ffmpeg] Merging formats into "Error Handling using Error Boundaries in React 16-2464.mp4"
Deleting original file Error Handling using Error Boundaries in React 16-2464.fdash-63abafa9-2580-4fd5-9a73-511de5dca9b8.mp4 (pass -k to keep)
Deleting original file Error Handling using Error Boundaries in React 16-2464.fdash-34043f9a-4e56-4683-bb70-027dd42b37cf.m4a (pass -k to keep)
smkamranqadri commented 7 years ago

still same

Muhammads-MacBook-Pro:youtube-dl-rg3 mkamran$ python -m youtube_dl "https://egghead.io/lessons/react-error-handling-using-error-boundaries-in-react-16" [egghead:lesson] react-error-handling-using-error-boundaries-in-react-16: Downloading JSON metadata WARNING: [egghead:lesson] Cannot find an proper ID, will use lesson name URL slug [egghead:lesson] react-error-handling-using-error-boundaries-in-react-16: Downloading MPD manifest ERROR: requested format not available

no change receive on pull

https://github.com/mk-pmb/youtube-dl-rg3.git