Fixed #93: Support all playlists as individual zims

rgaudin commented 4 years ago

Here's a simpler approach to implementing #93 .

Instead of reimplementing all arguments in a different script (duplicating a lot of tedious code), this only implements the required ones (api-key, type and ID) and keeps the rest for later use.

This scraper has very limited code that first checks whether we're in playlists mode (--indiv-playlists – open to suggestions for a better arg name!) and if not, directly calls youtube2zim.

When in playlists mode, after checking the credentials, it extracts the list of playlists using the same function youtube2zim uses and run youtube2zim for each of them.

Main differences:

This script is usable as a replacement of the docker command as it handles both regular and playlists request ; a requirement for the zimfarm which supports only a single command per scraper.
--playlists-name, --playlists-title, --playlists-description are used to build the respective args of youtube2zim but accepts some variables: title, description, slug (from title), playlist_id, creator_id, creator_name.

While this is great for title and description, name is not ideal as this build the filename as well and the slug is just a derivation of the title which can be very long and convoluted…

So we'd endup with either unreadable (using playlist_id) or out-of-convention (using slug) file names. We've already discussed that filenames are not expected to have meanings and that we have metadata for that… but I think it will be the first time this gets enforced.

At the moment, all of them are mandatory. Should we loosen up a bit and autofill title and description?

Very minor changes to the current scraper:

moved logic of extract_playlist into youtube sub module for reuse
moved build folder creation to utils sub module for reuse

satyamtg commented 4 years ago

This seems actually a smart way to do it considering zimfarm supports only one command per scraper (didn't know this at the time of creating #93). The idea of accepting --indiv-playlists to check seems good. However, this, if substituted for youtube2zim, we would need to call as youtube2zim-playlists even for scraping channels/users (doesn't make much sense to me). In this implementation it seems that youtube2zim is the "core" thing and is run by the newer script (which is great BTW).

I don't know how feasible/good it sounds but I think we can have the following modifications to this to have a much more sensible youtube2zim structure -

rename youtube2zim to youtube2zim-core
move all the main youtube2zim code to a submodule called "core" (or maybe some better name)
bring the youtube2zim.playlists to the original youtube2zim module (move from submodule to parent module)
rename youtube2zim-playlists to youtube2zim

I think this way we make more sense as this would not be an add-on but a replacement for youtube2zim. @rgaudin @kelson42 what do you think about this?

rgaudin commented 4 years ago

I don't know how feasible/good it sounds but I think we can have the following modifications to this to have a much more sensible youtube2zim structure - I think this way we make more sense as this would not be an add-on but a replacement for youtube2zim.

I actually considered it (tried a few things before settling on this) but eventually considered the real thing is youtube2zim and the playlists is just an addition to it ; hence the separate script.

It can be used without --indiv-playlists and fallsback on youtube2zim but it’s still an extra tool.

We won’t change the default command for the docker for instance but its existence should be documented in the README (will push that).

For the zimfarm, we’ll use the playlists one as we need the two methods but it’s a special case.

Defaulting to it for everybody would also mean loosing the benefit of youtube2zim --help.

That was the reasoning. Let me know what you think.

rgaudin commented 4 years ago

You may want to check fc17f0b, which follows discussion with @kelson42 regarding metadata customizations.

satyamtg commented 4 years ago

Okay. I get it. It's quite great and understand why what I tried to propose has certain limitations. I think I'm fine with it. Also saw the meta customization thingy and it seems quite cool.

openzim / youtube

Fixed #93: Support all playlists as individual zims #98