Skillshare - Githubissues

slavakurilyak commented 1 year ago

Checklist

[x] I'm reporting a new site support request
[x] I've verified that I'm running youtube-dl version 2021.12.17
[x] I've checked that all provided URLs are alive and playable in a browser
[x] I've checked that none of provided URLs violate any copyrights
[x] I've searched the bugtracker for similar site support requests including closed ones

Example URLs

Playlist: https://www.skillshare.com/en/classes/Understanding-and-Painting-the-Head/62353942/projects

Description

I would like to see a new extractor for Skillshare so I can make backups of my classes (ex: videos)

dirkf commented 1 year ago

Same request: https://github.com/yt-dlp/yt-dlp/issues/5813

dirkf commented 1 year ago

The free Intro class page has useful data in its <meta> tags, which strangely are placed in the <body> rather than the <head>.

Metadata from the og: properties:

title
description
image --> thumbnail

Video data from the twitter:player properties:

width
height
stream: "https://www.skillshare.com/en/sessions/download?id=3113009"
stream:content_type: "video/mp4"

If the subscriber videos have the same structure, just passing cookies from a logged-in browser session using --cookies ... would give access using the same extraction. However the subscriber video pages may be more complex.

In fact the generic extractor should handle the free videos since the twitter:player:stream property is found. The extractor rejects it because it has no extension, but as the content_type is provided that should be enough. Something like this:

             # twitter:player:stream should be checked before twitter:player since
             # it is expected to contain a raw stream (see
             # https://dev.twitter.com/cards/types/player#On_twitter.com_via_desktop_browser)
-            found = filter_video(re.findall(
-                r'<meta (?:property|name)="twitter:player:stream" (?:content|value)="(.+?)"', webpage))
+            found = re.findall(
+                r'<meta (?:property|name)="twitter:player:stream" (?:content|value)="(.+?)"', webpage)
+            if found:
+                ext = mimetype2ext(get_first(re.findall(
+                    r'<meta (?:property|name)="twitter:player:stream:content_type" (?:content|value)="(.+?)"', webpage), []))
+                found = found[:1] if ext else filter_video(found)
         if not found:
             # We look for Open Graph info:
             # We have to match any number spaces between elements, some sites try to align them (eg.: statigr.am)

slavakurilyak commented 1 year ago

Any updates on this extractor?

slavakurilyak commented 1 year ago

ytdl-org / youtube-dl

Skillshare #31427

Checklist

Example URLs

Description