All scrapers are setting ZIM tags based on a user-provided string with semi-colon separator between values (or at least they should).
Some scrapers are also setting few tags automatically, in addition to the user-provided tags.
This list of tags should be de-duplicated and tags provided by user should be trimmed from any leading / trailing whitespace.
Having a utility function at zimscraperlib level to share this logic would help avoid reinventing the wheel over and over again. This function would take two parameters: default_tags (list of str) and user_tags (str) and return a list of tags ready to be passed to the creator (or a set? would be better if the creator supports passing a set, to be checked at validate_tags and libzim levels).
All scrapers are setting ZIM tags based on a user-provided string with semi-colon separator between values (or at least they should).
Some scrapers are also setting few tags automatically, in addition to the user-provided tags.
This list of tags should be de-duplicated and tags provided by user should be trimmed from any leading / trailing whitespace.
Having a utility function at zimscraperlib level to share this logic would help avoid reinventing the wheel over and over again. This function would take two parameters: default_tags (list of str) and user_tags (str) and return a list of tags ready to be passed to the creator (or a set? would be better if the creator supports passing a set, to be checked at validate_tags and libzim levels).
warc2zim is going to have what looks like a promising implementation (after https://github.com/openzim/warc2zim/pull/267 is merged).