Closed automactic closed 7 years ago
I agree with the principle of adding metatag information to opt-out part of the HTML text. Not sure this is the thing to do for all the examples you have given, but this sounds definitely a good approach. Give it a try!
What code should I modify to add comments to html strings? Also, do you think adding comments to html string will increase the size of zim files?
This issue was moved to openzim/mwoffliner#1725
Problem:
In current xapian indexing process, the content of of article extracted by omega contains a lot of useless info, such as reference section, the legal footnote and the inline references.
Desired Output:
A clean string of article content, without
Example:
The "apple juice" article in wikipedia_en_simple_all_2016-05.zim Here is the info extracted by omega html parser and passed to xapian for indexing:
Possible Solution:
Add UdmCommentmmarkup to comment out parts of the html, so omega html parser can ignore them. (source)