openzim / warc2zim

Command line tool to convert a file in the WARC format to a file in the ZIM format
https://pypi.org/project/warc2zim/
GNU General Public License v3.0
40 stars 5 forks source link

Make fuzzy-rule configurable with an external data source #341

Open benoit74 opened 4 days ago

benoit74 commented 4 days ago

Currently, fuzzy rules are configured in a YAML (/JSON) file and transformed into code.

Mid-term goal is to share these rules with WebRecorder team and other contributors. This probably means that at some point we will need to source this information from an online source.

Even before that, we would benefit from being able to source these rules from an external data source so that they can be updated without needing a new warc2zim release.