openzim / zim-tools

Various ZIM command line tools
https://download.openzim.org/release/zim-tools/
GNU General Public License v3.0
120 stars 34 forks source link

zimcheck should check for empty <title> entry #272

Open kelson42 opened 2 years ago

kelson42 commented 2 years ago

This is important that HTML front-article have a valid/non-empty <title> entry... or even a non-existing <title> tag! Otherwise the whole Kiwix suggestion system will fail. See for example https://github.com/openzim/ted/issues/125

mgautierfr commented 2 years ago

Agree. But we may have false positive. Once the zim is written, the both situations "title is empty" and "title==path" are equivalent and not distinguishable.

As said in https://github.com/openzim/ted/issues/125#issuecomment-999489746 if no entry has a title, we don't have a title index at all. We may check for that first. We can also loop over all the entries in the xapian title index and the front article list and compare the entries. By definition, front articles are put in the front articles list AND indexed in the xapian title index. But if the real title is empty, it is not indexed. So we can detect that something goes wrong at a moment. But it is probably a bit more complex (not necessarily complex, but we have never checked a xapian database before)

kelson42 commented 2 years ago

@mgautierfr Your proposal seems to be an other way to come to the same diagnostic. No opinion for the moment what would be the best approach... But we should better check it because missing titles have a quite strong impact on UX.

kelson42 commented 1 year ago

This ticket is clearly blocked by #331