Open pashakiz opened 2 years ago
Plainly neither of the targets sought for course id
is in the web page:
r'data-course-id=["\'](\d+)'
r'"courseId"\s*:\s*(\d+)'
If you can get the plain web page HTML using curl or wget with your cookies, we might be able to see what's wrong.
I'm not good at CLI curl...
I try this on my Windows 10:
curl -b udemy_cookies.txt https://www.udemy.com/typescript-full/
And get: Cannot send content body with given predicate type
But I saw in browser and found this (data-clp-course-id)
<body id="udemy" class="
ud-app-loader ud-component--course-landing-page-udlite
udemy " data-clp-course-id="4412496" data-module-id="course-landing-page/udlite" data-module-args="...">
and this (courseId)
<div class="clp-component-render"><div class="clp-component-render"><div class="ud-component--course-landing-page-udlite--purchase-body-container" data-component-props="{"componentProps":{"purchaseSection":{"is_course_paid":true,"has_subscription_offerings":false,"subscription":null,"style_full_lifetime_access":"full-lifetime-access","style_money_back_guarantee":"money-back-guarantee"},"purchaseInfo":{"isValidStudent":false,"purchaseDate":null},"moneyBackGuarantee":{"is_enabled":true},"addToCart":{"buyables":[{"buyable_object_type":"course","id":4412496,"buyableContext":{"contentLocaleId":null}}],"onAddRedirectUrl":"/cart/added/course/4412496/","addedButtonBsStyle":"primary","is_enabled":true}},"courseId":[4412496],"courseObject":{"id":4412496,"is_private":false}}"><div data-unique-id="450" style="display:none"></div><div><div class="purchase-section-container-skeleton--price--3Xcfk purchase-section-container-skeleton--skeleton--1UsRE skeleton--skeleton--1jc5m"><div class="text-skeleton--text-skeleton--7BlWc skeleton--skeleton--1jc5m"><p><span class="text-skeleton--line--3Pla- block--block--1b0nE"></span><span class="text-skeleton--line--3Pla- block--block--1b0nE"></span></p><div class="skeleton--shine--2nD_V"></div></div><div class="skeleton--shine--2nD_V"></div></div><div class="purchase-section-container-skeleton--cta--jnShg purchase-section-container-skeleton--skeleton--1UsRE skeleton--skeleton--1jc5m"><span class="block--block--1b0nE"></span><div class="skeleton--shine--2nD_V"></div></div><div class="purchase-section-container-skeleton--money-back--3lqS1 purchase-section-container-skeleton--skeleton--1UsRE skeleton--skeleton--1jc5m"><span class="block--block--1b0nE"></span><div class="skeleton--shine--2nD_V"></div></div></div></div></div></div>
</div>
Will it help?
curl -b udemy_cookies.txt https://www.udemy.com/typescript-full/
Try curl -c udemy_cookies.txt "https://www.udemy.com/typescript-full/"
.
But your observation may be enough. This patch would find the course ID, though obviously there may be other changes if the course ID is being sent differently:
--- old/youtube-dl/youtube_dl/extractor/udemy.py
+++ new/youtube-dl/youtube_dl/extractor/udemy.py
@@ -77,8 +77,8 @@
video_id, fatal=False) or {}
course_id = course.get('id') or self._search_regex(
[
- r'data-course-id=["\'](\d+)',
- r'"courseId"\s*:\s*(\d+)'
+ r'data-(?:clp-)?course-id\s*=\s*["\'](\d+)',
+ r'"courseId"\s*:\s*\[?(\d+)'
], webpage, 'course id')
return course_id, course.get('title')
Great! It works now with this patch! Thank you!
curl -b udemy_cookies.txt https://www.udemy.com/typescript-full/
Try
curl -c udemy_cookies.txt "https://www.udemy.com/typescript-full/"
.But your observation may be enough. This patch would find the course ID, though obviously there may be other changes if the course ID is being sent differently:
--- old/youtube-dl/youtube_dl/extractor/udemy.py +++ new/youtube-dl/youtube_dl/extractor/udemy.py @@ -77,8 +77,8 @@ video_id, fatal=False) or {} course_id = course.get('id') or self._search_regex( [ - r'data-course-id=["\'](\d+)', - r'"courseId"\s*:\s*(\d+)' + r'data-(?:clp-)?course-id\s*=\s*["\'](\d+)', + r'"courseId"\s*:\s*\[?(\d+)' ], webpage, 'course id') return course_id, course.get('title')
what is the way to apply this patch?
I found this file here:
c:\python39\lib\site-packages\youtube_dl\extractor\udemy.py
See above - this path is in the verbose log. So you can find this path at your machine from your verbose log.
And fixed them manually following the advice above.
i.e replace this two lines:
r'data-course-id=["\'](\d+)',
r'"courseId"\s*:\s*(\d+)'
with these:
r'data-(?:clp-)?course-id\s*=\s*["\'](\d+)',
r'"courseId"\s*:\s*\[?(\d+)'
Hello, I couldn't find this udemy.py. The verbose says c:\Users\dst\ ... but this "dst" does not exist. I searched my entire C: drive (with "dir udemy.py /S") and found none. I opened the cookie txt and couldn't find anything with "data-" ... how do I get proper cookie?
You probably have the Windows self-extracting executable, which is not so easy to patch. Install Python and use pip to install yt-dl: the extractor source file will then be accessible.
Even if I got the other version, how could it extract the course id if I couldn't even find "data-" in the cookie.txt? I used the browser extension mentioned in the original post. Brave browser.
thank you @pashakiz and @dirkf . I was able to get it working as a result of your answers.
Even if I got the other version, how could it extract the course id if I couldn't even find "data-" in the cookie.txt?
You need the two changes, apparently:
/course/
or otherwisedata-clp-course-id
attribute.Hi dirkf, I couldn't find "course" anywhere in the cookie either. The txt file has 46 rows and some of them do have access_token and ud_last_auth_information, but no "course" or "data-". There is "muxData" however. I'm using the extension mentioned in the original post.
It's in the web page, nothing to do with the cookies, which are necessary to be able to fetch the page at all. You don't need to care what's in the cookie file, except that it was extracted from a currently logged-in browser session and hasn't subsequently expired.
Don't forget to clean udemy cookies before exporting it(to avoid previous/business session tampering current cookies) And also if download process suddenly halted sometimes VPN could help.
Well, you might caught another problems down the road, and this method won't work on business account, but if you're lucky it's possible to download all the videos.
Checklist
Verbose log
Description
I am trying to download a paid course from udemy.com And when I try use login/pass - I getting this:
ERROR: Unable to download webpage: HTTP Error 403: Forbidden
So I tried use
--cookies
option and getting another error (above). I'm export udemy_cookies.txt form browser by this chrome extention: https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid