Unable to extract course id while downloading udemy courses

privatejava commented 5 years ago

Checklist

[x] I'm reporting a broken site support issue
[x] I've verified that I'm running youtube-dl version 2019.09.12.1
[x] I've checked that all provided URLs are alive and playable in a browser
[x] I've checked that all URLs and arguments with special characters are properly quoted or escaped
[x] I've searched the bugtracker for similar bug reports including closed ones
[x] I've read bugs section in FAQ

Verbose log

python ~/Documents/youtube-dl/youtube_dl/__main__.py https://companyx.udemy.com/deeplearning/ -o '%(playlist)s/%(chapter_number)s - %(chapter)s/%(playlist_index)s. %(title)s.%(ext)s' --cookies ~/Downloads/cookies.txt --verbose
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['https://companyx.udemy.com/deeplearning/', '-o', '%(playlist)s/%(chapter_number)s - %(chapter)s/%(playlist_index)s. %(title)s.%(ext)s', '--cookies', '/home/userx/Downloads/cookies.txt', '--verbose']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2019.09.12.1
[debug] Git HEAD: 33c1c7d80
[debug] Python version 3.6.8 (CPython) - Linux-4.15.0-62-generic-x86_64-with-Ubuntu-18.04-bionic
[debug] exe versions: ffmpeg 3.4.6, ffprobe 3.4.6
[debug] Proxy map: {}
[udemy:course] deeplearning: Downloading webpage
[udemy:course] 1151632: Downloading course curriculum
[download] Downloading playlist: 1151632
[udemy:course] playlist 1151632: Collected 155 video ids (downloading 155 of them)
[download] Downloading video 1 of 155
[udemy] 12350046: Downloading webpage
ERROR: Unable to extract course id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/home/userx/Documents/youtube-dl/youtube_dl/YoutubeDL.py", line 796, in extract_info
    ie_result = ie.extract(url)
  File "/home/userx/Documents/youtube-dl/youtube_dl/extractor/common.py", line 530, in extract
    ie_result = self._real_extract(url)
  File "/home/userx/Documents/youtube-dl/youtube_dl/extractor/udemy.py", line 219, in _real_extract
    course_id, _ = self._extract_course_info(webpage, lecture_id)
  File "/home/userx/Documents/youtube-dl/youtube_dl/extractor/udemy.py", line 82, in _extract_course_info
    ], webpage, 'course id')
  File "/home/userx/Documents/youtube-dl/youtube_dl/extractor/common.py", line 1005, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)
youtube_dl.utils.RegexNotFoundError: Unable to extract course id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Description

I am trying to download some videos of this course from my company's udemy account and it is totally similar to what udemy default provides but it has subdomain of company's name. For now I have changed my company to companyx . I am able to see the HTML code using chrome and I see there is big chuck of JSON data and it seems to be new HTML format (non-angularjs) unlike before . I stumbled upon the code of udemy.py there is a regex matching for ng-init which will not work in new udemy UI since it is totally different. If you want I could help on pulling out the sanitized HTML output for any specific URL.

mudasirmirza commented 4 years ago

👍 this has started happening. I think udemy has changes their URL scheme

super-sonicX commented 4 years ago

Still an issue in version 2020.03.24

Any ideas when/if this can be fixed?

dirkf commented 2 years ago

Continued in #30719.

ytdl-org / youtube-dl