openzim / warc2zim

Command line tool to convert a file in the WARC format to a file in the ZIM format
https://pypi.org/project/warc2zim/
GNU General Public License v3.0
44 stars 4 forks source link

Add solution (CLI parameter?) to force some entries not be not considered as suggestion / random page #240

Open benoit74 opened 4 months ago

benoit74 commented 4 months ago

Currently, the isFront marker used to detect which entries are subject to be used for suggestions / random pages is computed automatically based on the automatically detected mimetype.

This leads to situations where some pages which are not meant to be used as standalone are accessible to the user either through suggestions or random pages.

See e.g. https://dev.library.kiwix.org/viewer#mes-quartiers-chinois_fr_all_2024-03/ which has many pages like https://dev.library.kiwix.org/viewer#mes-quartiers-chinois_fr_all_2024-03/youtube.fuzzy.replayweb.page/embed/-KpLmsAR23I in random pages.

We should find a solution to filter out these "non-wanted" entries either purely automatically or through a CLI parameter

rgaudin commented 4 months ago
benoit74 commented 4 months ago

Two very good points indeed!