Closed rgaudin closed 2 years ago
@mgautierfr Really important to release soon next 1.1.0 release with a fix for that. The stability and performance of library.kiwix.org depends of it.
I'm not sure about what your complaining exactly:
Tags
metadata to have something normalized, it is the job of libkiwix_ftindex:yes
is somehow a bug (in https://github.com/kiwix/libkiwix/blob/master/src/tools/otherTools.cpp#L219-L254) (but it was duplicated in the zim file itself). But it should not be a problem (as far as the two _ftindex:*
are coherent)_images:yes
, _video:yes
if there is no information in the zim file as those tags are somehow mandatory and this was the default on old zim file (we introduced nopic
/novid
/... to express there was no image/video/...)How is it a blocker for you ?
Ah! I didn't know that libkiwix added those tags. This solves this frightening mystery.
How is it a blocker? The central XML library used to be generated using kiwix-manage. It is now generated by a pylibzim-based script but we had a lot of different entries for the same content.
I imagine some readers may use those tags so I'll port that feature to the script (in scraperlib I suppose).
Thanks for the answer ; we knew it would be something obvious but I didn't expect this 😉
Zimdump should better ne used for inspwcting a ZIM.
BTW, you probably have a bug in the creator/scrapper as you don't put the right Tags
in the zim file.
Yes, I believe most non-mwoffliner scrapers don't specify all of those. I'll check all of them. We usually don't have flavours/filters but the ftindex tag might be missing.
Here's an example with this small ZIM: gutenberg_he_all_2022-04.zim
Now, using libkiwix:
And here's the (formatted, favicon removed) output
You'll notice that
_ftindex:yes
is repeated but AFAIK libzim doesn't care about the content of metdata…@mgautierfr please take a look ; this is accidentally blocking a lot of stuff on my side.