ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.27k stars 10.03k forks source link

[youtube:user] make user/c regex a separate info IE #10126

Closed yuri-sevatz closed 8 years ago

yuri-sevatz commented 8 years ago

Just out of curiosity... can we make the recent user/c patch a separate IE? (Or any objection if I created a pull request to do this?). I noticed a build breakage after this.

https://github.com/rg3/youtube-dl/commit/9558dcec9c7806c811f4fe8e7758977eaa01a702

I have some code that's been leveraging the 1:1 relation between an IE and its _VALID_URL _TEMPLATE_URL to create some quick and dirty automated scrapers for a bunch of IE's. Best part of this is that when you do this, the "unique identity" of a video can be determined while offline for a lot of IE's, and you can reverse the "unique identity" back to a usable url when traversing.

Here's the project and the way we're using these for your consideration:

https://github.com/yuri-sevatz/youtube-sync/blob/master/youtube_sync/__init__.py#L420

I know I should have tried to merge some of this logic to youtube-dl a while ago, but I've been too lazy and shy :)

yuri-sevatz commented 8 years ago

I suppose alternatively I can make the "unique identity" of a video take an array of arguments (json, etc), but this is not ABI-maintainable between youtube-dl upgrades, unless we start to version them.

dstftw commented 8 years ago

In 3rd party code you should neither rely on _VALID_URL nor on _TEMPLATE_URL. As well as you should not make any assumptions about particular extractor implementation details. I don't see much sense to apply such changes just in order to make some 3rd party code relying on that work since some another code may require the opposite.

yuri-sevatz commented 8 years ago

So I understand the argument and I agree if we don't go into any detail about what these could do.

What I'm saying here is that "some 3rd party code" happens to handle the notion of video identity a little better than youtube-dl, gets Google accounts banned far less often than youtube-dl, and can perform analysis of what a playlist (or any source for the matter) has and doesn't have offline, and can cross-reference any updates with what you want -- without re-running the IE's and harassing the remote servers for everything else that you already have.

Because that's what gets accounts banned!

... and it does this better than youtube-dl, using mostly the information you already have inside your IE's. If the only thing you logically need API-wise is the capacity to make video identity both be:

1) Extracted from an IE. 2) Re-Inserted into an IE

-- Then why not aim for it because it's easily low-hanging fruit from what you've already got in IE definitions?

I can do it myself, I really just wanted to see what you guys thought.

yuri-sevatz commented 8 years ago

Even if you don't want to go into any detail on this, there's simple reason why this looks like a contradiction to me:

+        # Only available via https://www.youtube.com/c/12minuteathlete/videos
+        # but not https://www.youtube.com/user/12minuteathlete/videos
+        'url': 'https://www.youtube.com/c/12minuteathlete/videos',
+        'playlist_mincount': 249,
+        'info_dict': {
+            'id': 'UUVjM-zV6_opMDx7WYxnjZiQ',
+            'title': 'Uploads from 12 Minute Athlete',
+        }
+    }, {

So, by this logic, a www.youtube.com/channel/ could very well go into the YoutubeUserIE too -- even though the set of videos it returns is mutually exclusive to both www.youtube.com/user/ and www.youtube.com/c/*. What is the meaning of the YoutubeUserIE? What is the meaning of the YoutubeChannelIE?

If they're just talking about vague concepts that happen to be convenient to the maintainers of the regexes, and that they mean nothing to userspace, then why even have them exposed to userspace at all? Why does the user-level api even have different IE's, if there's no distinguishing between the things they're allowed to refer to?

yuri-sevatz commented 8 years ago

In the example they give:

c/12minuteathlete and user/12minuteathlete are not interchangeable, therefore it begs the question why are they in the same IE?

The underlying user seems to be user/the12minuteathlete when I click on the title on the page for c/12minuteathlete, so whatever c/12minuteathlete is, it's certainly not a user!