openzim / youtube

Create a ZIM file from a Youtube channel/username/playlist
GNU General Public License v3.0
43 stars 26 forks source link

Set ZIM description properly in case of multiple playlist scraping #147

Closed kelson42 closed 8 months ago

kelson42 commented 3 years ago

Currently it seems to be a - character like at http://library.kiwix.org/khan-academy-videos_en_geometric-optics-ap-physics-2-khan-academy_2021-04/M/Description

rgaudin commented 3 years ago

this is a local host link...

rgaudin commented 3 years ago

Do tou mean the description is simply "-" ?

rgaudin commented 3 years ago

When building for a single playlist, scraper uses this playlist's description as zim description. When it's several playlists, it's not possible to build anything meaning full so it renders as -. Of course, it's up to the scraper user to provide a meaningful one.

If you want to replace that placeholder with another one, please suggest something.

kelson42 commented 3 years ago

@rgaudin If there is no description, it should be empty or not even existing metadata. But not -. @Popolechien I had the feeling after our discussion, that it was possible to scrape something meaningfull?

rgaudin commented 3 years ago

@rgaudin If there is no description, it should be empty or not even existing metadata. But not -.

Contradicts the “we should always provide a Description”. I think the - incentivizes the creator into changing it but I'm fine either way.

@Popolechien I had the feeling after our discussion, that it was possible to scrape something meaningfull?

How would that be possible? Playlists can have nothing in common. You can choose to use the first playlist description, that's meaningful but misleading or you can build another placeholder like “2 playlists from Youtube” or something but that's not meaningful.

We are talking default behavior here, scraper should release users from having to input stuff that is available but it's not its purpose to do their jobs completely. Building a several playlists Zim is very handy but providing a description for this Zim should be up to the user.

Popolechien commented 3 years ago

@Popolechien I had the feeling after our discussion, that it was possible to scrape something meaningful?

I don't remember the discussion, maybe that was on another topic? I checked and playlists on YT do not have specific descriptors, which is a bummer. Maybe something more generic like "A playlist from the XYZ Youtube channel" would be a little better, but that's as far as we could go.

rgaudin commented 3 years ago

Maybe something more generic like "A playlist from the XYZ Youtube channel" would be a little better, but that's as far as we could go.

@Popolechien when we have a single playlist, we already have something: the channel's description I believe. This ticket is for ZIM of multiple playlists.

Popolechien commented 3 years ago

This ticket is for ZIM of multiple playlists.

@rgaudin Yes I understand. I agree that just "-" does not provide enough information at the moment if the playlist title is too generic or too specific. Channel descriptions are written for people who know already they're on Youtube (as opposite to browsing a random list of contents from an app's library). Knowing that the content comes from Youtube and is a playlist is already actionable information for the user IMHO.

rgaudin commented 3 years ago

Totally agrees. Please propose a way forward for all the use cases that are not good enough. Otherwise this ticket is not fixable.

Popolechien commented 3 years ago

Then let's set the default zim description to A playlist from the Channel name Youtube channel

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

kelson42 commented 9 months ago

We need to have a proper description in the catalogue and this is why we need the Metadata Description in the ZIM.

BTW, the specification does not explicitly say that empty-string is forbidden... Not sure this is a caveat or a feature... But we want to have a ZIM description.

I'm not in favour of doing something generic like - or A playlist from the Channel name Youtube channel... this brings almost not information. This is a workaround, not a solution.

Therefore there is not IMO thousand possibilities:

IMHO the last solution is the appropriate one, we should create one recipe per playlist and specify clearly the description. I prefer to have less content but better quality content.

benoit74 commented 9 months ago

IMHO, I see significant advantages in the idea of removing the "Playlist mode" from the Zimfarm:

The main disadvantages are:

benoit74 commented 9 months ago

And this issue focused so far on the description, but same if true for other metadata like name, filename, title even maybe tags, icon (with varying degree of customization needed of course)

rgaudin commented 9 months ago

I also think of current playlists-mode usage as lazy mode and believe we'd improve the ZIM quality by not using it in the Zimfarm. The load ventilation argument is also very valid.

That said, I'd just remove support for it in the farm and keep the feature in the scraper.

kelson42 commented 8 months ago

I have open the ticket on Zimfarm, anything else we could do here with this issue?

Popolechien commented 8 months ago

Nope. I'm late to the party but I agree with the idea of removing the multiple playlists mode from the farm for the time being.