openzim / youtube

Create a ZIM file from a Youtube channel/username/playlist
GNU General Public License v3.0
39 stars 26 forks source link

Rework or remove the `playlist_mode` #188

Closed benoit74 closed 1 month ago

benoit74 commented 7 months ago

In this scraper, we have a playlist_mode which allows to create a ZIM per playlist found in a given Youtube user / channel.

This mode is convenient to create many ZIMs at once, but it poses an issue in terms of metadata quality since titles, descriptions, ... are automatically sourced from Youtube.

With the move to scraperlib 3.x, the creation of ZIMs with invalid title, description, ... will fail. Unfortunately, this check is done only at the end of the scraping since we still use the "zimwriterfs" mode with make_zim_file at the end of the scraper, after all videos have been downloaded and reencoded.

We should either:

This is a blocker for #175 in fact (or we accept to have a functionality which will not work in 90% of the cases)

benoit74 commented 7 months ago

As discussed live and proposed in https://github.com/openzim/python-scraperlib/issues/119, we could just disable the metadata check in scraperlib.

This could be an opt-in flag in general, and the default when using playlist mode. And we could display a warning when metadata is not valid.

This would allow to continue to support this mode for the ones wanting to create their own ZIMs, while still ensuring metadata quality for openZIM files. And would allow to upgrade to 3.x in an elegant way.

@kelson42 WDYT?

benoit74 commented 3 months ago

This approach has been implemented in TED scraper: https://github.com/openzim/ted/pull/170