sealedtx / java-youtube-downloader

Simple, almost zero-dependency java parser for retrieving youtube video metadata
Other
426 stars 116 forks source link

#33 Playlist Support #44

Closed Grodou closed 4 years ago

Grodou commented 4 years ago

Get playlist details and all videos. No download for now.

sealedtx commented 4 years ago

@Grodou Hi, great work! Thank you for contributing. I have few suggestions and questions:

Grodou commented 4 years ago

@sealedtx Hi, thank you for the review. Here come the answers:

PlaylistVideo (which extends AbstractVideoDetails) is counterintuitive as for me. I don't know better naming for now, need to think, mb PlaylistItem or PlaylistVideoDetails

Completely agree. I think PlaylistVideoDetails is better.

Why do you follow C89 variable declaration style? It's 2020 now, you don't need to declare variables at the beggining of methods :)

Old school style I guess :)

Please explain what is "continuation" of playlist, does it mean that playlist is too big to display in a single webpage and we need to load more?

Exactly, 100 videos max. Scrolling to the bottom loads 100 more, and so on. All videos are directly loaded for now, a paging system could be handy.

What are those hardcoded values X-YouTube-Client-Version", "2.20200720.00.02" and do we need to update them from time to time, if yes where I should look for them?

Just a part of playlist.py. The actual client version we should use is available in the ytInitialData.responseContext part of the response. (for the record, it's also useful for faster downloads, maybe another PR) For the "X-YouTube-Client-Name", I don't know, but both are mandatory.

What does "unstable" comment mean in your getVeryLongPlaylist?

That playlist is often updated, we can't do reliable tests on its length or content.

Still in progress, next step is to make a PlaylistVideo as easy to use as a YoutubeVideo (format selection, downloads, ...).

sealedtx commented 4 years ago

@Grodou Thank you for the answers.

Old school style I guess :)

Please, make sure you follow current code base style (even if it is not perfect), for easier maintenance in future.

The actual client version we should use is available in the ytInitialData.responseContext part of the response.

So mb we should implement getOrDefault logic for these values (search them in response, if not found - use default hardcoded)?

next step is to make a PlaylistVideo as easy to use as a YoutubeVideo (format selection, downloads, ...).

Not sure if you should focus on this. I believe that there is no simple way but just YoutubeDownloader.getVideo(id) for all of items in playlist which will be an overhead if user want only metadata about playlist items. I think current api might be sufficient to get playlist info (vidoe elements with videoId/title/thumbnail) and later it is possible to get YoutubeVideo if you need to download only some of videos. Correct me if I am wrong.

Grodou commented 4 years ago

@sealedtx Hi, here are some answers and suggestions.

Please, make sure you follow current code base style (even if it is not perfect), for easier maintenance in future.

It wasn't clear, but of course I will.

So mb we should implement getOrDefault logic for these values (search them in response, if not found - use default hardcoded)?

Looks like the client version is always in the response, no need for hardcoded value.

I think current api might be sufficient to get playlist info (vidoe elements with videoId/title/thumbnail) and later it is possible to get YoutubeVideo if you need to download only some of videos. Correct me if I am wrong.

I agree, it is sufficient. However, functionalities like these could be handy:

I made a version of it, and since code explains it better than I could, I will soon commit so you can tell if it's interesting or not. And if not, I will of course take care of all the cleanings.

Grodou commented 4 years ago

@sealedtx Here is the version with download available, main changes are listed below.


Video hierarchy AbstractVideo<T extends AbstractDetails>

PlaylistVideo lazy loading: formats() and subtitles() trigger a load action. We can force a single video to do so with PlaylistVideo.fetch(). We can force all videos to do so with Playlist.fecthVideos(). If an error occurs, formats and subtitles are empty, the initial exception is accessible via getFetchError() Other ways to do:

  1. throw a runtime exception, YoutubeVideo method signatures remain unchanged
  2. throw a YoutubeException, and change YoutubeVideo methods

    Exceptions YoutubeException.DownloadUnavailableException superclass

    • LiveVideoException for live videos
    • RestrictedVideoException for playlist private and deleted videos

      Tests

    • hierarchy, reuse
    • output directory is deleted after each test
    • video and playlist tests have their own directory

I noticed a strange behaviour: a video can have a specific available format at a time, and a few seconds later, it's gone. Unit tests take care of it and should success any time. I didn't find any stable playlist with deleted and/or private videos to test.

Let me know what you think.

sealedtx commented 4 years ago

@Grodou Hi, sorry for a long response, I feel busy last days. First of all thank you for a great work! And again suggestions:


YoutubePlaylist: I believe that it should contain only these methods:

I hardly imagine that some one is looking for a videos in playlist only with specific itag or quality. Even so it could be achieved with findVideos(). I think that we should remove download methods from YoutubePlaylist, because it creates additional hierarchy: playlist->videos->formats. I see user's steps as:

Moving last four items into a single one makes code difficult to read and user won't understand what is under the hood (additional http requets), IMHO.


PlaylistVideo: You should not keep reference to Parser inside it. Better to pass it inside fetch(parser) only once it is needed. Storing fetchError also looks not pretty. I think subtitles() and formats() should return empty list or throw runtime exception if not fetched yet, choose the one you like more. BTW: you may use Collections.emptyList() instead of Collections.EMPTY_LIST and it won't require @SuppressWarnings("unchecked").


PlaylistVideoDetails: Could you please explain what is isPlayable and when it can be false?


Packages: Need to think about package refactoring. If we have playlist package mb we should add video package and move there YoutubeVideo and VideoDetails.


Waiting for your comments and thoughts. Thank you!

Grodou commented 4 years ago

@sealedtx Hi, thank you for the kind comments, and sorry it took me some time to answer.


Lazy loading

Since it's a central part of this version, I think we should agree about it first.

You should not keep reference to Parser inside it. Better to pass it inside fetch(parser) only once it is needed.

If that so, lazy loading is broken. And then:

In my opinion, this widely used design pattern is completely adapted to this case: it loads data only when it's needed (if ever), and hides any underlying process.

Please feel free to tell if you don't want it, I'll keep the version in a branch anyway, just in case. Hope this helps to make a final choice about it, so we can get closer to a final version!


2 questions I can answer for now:

is it possible that video index is not same as list index? Or may be we can sort them and remove this method?

Video index starts at 1, not 0. Videos seem to be always sorted. findVideoByIndex(index) could simply return videos().get(index - 1) (after a basic boundary check)

Could you please explain what is isPlayable and when it can be false?

A false value means the video is either private or deleted

sealedtx commented 4 years ago

@Grodou Thank you for the answers.

Lazy loading

I understand what is lazy loading pattern and why it might be applied here. But I I doubt about hiding from user additional http request, because I know that this library is used by android devs. Due to lazy loading it is not clear that formats() or subtitles() requires http request, as I know network request may freeze UI Thread on android or even throw an Exception (because it is not allowed to work with network on main thread). I need to think about this, give me few days, please.

YoutubePlaylist

findVideoByIndex(index) could simply return videos().get(index - 1) (after a basic boundary check)

I think in such case this method can be omitted.


Still waiting for your thoughts about shrinking YoutubePlaylist api. Thank you.

Grodou commented 4 years ago

@sealedtx Hi, thank you for the helpful informations.


Lazy loading

I was wondering why you seemed both interested and reluctant, and just wanted to move on. Sorry about that, please take your time.


YoutubePlaylist

I think in such case this method can be omitted (findVideoByIndex(index))

Totally agree.

Still waiting for your thoughts about shrinking YoutubePlaylist api

I wanted to know about lazy loading first, because there's a lot to say, as you can see. So let's assume playlist videos still have formats.

fetchVideos() (Not sure about this, because if one fetch fails it will skip subsequent ones)

Just a poor implementation of mine, it should better catch exceptions and return a list of successfully fetched videos or at least the count of it.

may be it should be findFormats() and return list of formats, same as findVideos()

Download

I hardly imagine that some one is looking for a videos in playlist only with specific itag or quality.

If you're interested by a whole music playlist so you want all the audio tracks, we could then assume that:

So you could either:

and download them all!

Even so it could be achieved with findVideos().

Absolutely, but then, you still have to loop over videos, select the same format each time, and download. The download feature is an utility, not an obligation. Any user will always be free to follow the 5 steps process you suggest.

2 more ideas for downloads.

We could watch progress (for an async task) with a listener like:

We could move the whole download logic to the Format class since:


Packages

Need to think about package refactoring. If we have playlist package mb we should add video package and move there YoutubeVideo and VideoDetails.

Indeed, a dedicated package for "regular" video would make things clearer. By the way, I think Itag and Extension should better belong to the format package.


That was a lot of topics, sorry for the longest comment ever :) Hope it will be helpful.

sealedtx commented 4 years ago

@Grodou Hi! Thank you for your response.

Lazy loading

I really appreciate your work done, but unfortunately, I decided not to implement (or at least postpone it) lazy loading for playlist video formats, due to reasons mentioned above.


YoutubePlaylist

Just a poor implementation of mine

I prefer to keep simple things done well at first, and leave opportiunity to extend functionally later. If we don't know how it should be implemented to work well - remove it for now, and discuss/add in the further updates.


Download

all videos have the same set of formats

It might be wrong. It is common, that different videos may have different set of formats (it may depend on the period when they were uploaded, quality of the source video, etc). And also youtube make changes from time to time.

The download feature is an utility, not an obligation. Any user will always be free to follow the 5 steps process you suggest.

User must be aware about what is happening using this library. As I already mentioned, merging these steps into one can lead to unclear situations, errors may occur during fetching meta, or accessing file system, or connection, or downloading. It would be difficult to build such api that properly handles all these possible errors in a single method. I believe user of this library must be responsible for it.

We could move the whole download logic to the Format class since:

Agree. But I don't want to do changes which break backwards compatibility. There is a lot of work to be done about refactoing, including classes naming, packages structure, shrinking useless methods, extending api, fixing api and code style. But I want it to be released in the 3.x.x version. I will create "refactoring" branch later. I like most of your ideas and you are welcomed to contribute!

By the way, I think Itag and Extension should better belong to the format package.

Itag - yes, but Extension contains also subtitles formats, not only video. As I said, not in this PR, sorry for starting discussion about refactoring here.


So... in conclusion... sorry for being not very clear: In this PR I want to see the most simple functionality for playlist meta extraction, but enough to be useful. It should cover the most common use cases, and it should be clear for user where and which errors may occur. We may discuss further improvements (but no breaking changes in version 2.x.x) after the feature will have already been released and useful for people.

sorry for the longest comment ever :)

Don't worry, my comment is also not very compact :)

Grodou commented 4 years ago

@sealedtx Hi, that's good news! I think it's a very good decision.

So, in order to keep the API as clear and simple as possible, I think:

Like this, things are clear, an item is not a video, and access is more simple: downloader.getVideo(item.videoId()). Tell me if that's OK for you.

sealedtx commented 4 years ago

@Grodou Sounds good. But let it be PlaylistVideoDetails, to understand that this class contains fields of AbstractVideoDetails

Grodou commented 4 years ago

@sealedtx OK, should the YoutubePlaylist field be items anyway ? or videos ?

sealedtx commented 4 years ago

@Grodou let it be videos ;)

Grodou commented 4 years ago

@sealedtx Hi, it's done.

Just a few comments:

Hope that's fine.

sealedtx commented 4 years ago

@Grodou Perfect! Is everything ready to merge?

Grodou commented 4 years ago

@sealedtx Hi, I forgot one little line... Ready to merge now!

Probably not the right place to ask, but I'd like to make a PR for faster downloads. Is it relevant? Should I create an issue first?

sealedtx commented 4 years ago

@Grodou Thank you.

Is it relevant? Should I create an issue first?

For sure, and yes, it would be better to explain your thoughts in the issue.