openzim / overview

:balloon: Start here for current projects, how to get involved, and joining community calls. A resource for new and veteran members of the offline commmunity
2 stars 1 forks source link

Provide documentation for ZIM Index Format #6

Closed Jaifroid closed 4 years ago

Jaifroid commented 4 years ago

Currently there is a link in https://wiki.openzim.org/wiki/ZIM_file_format#Namespaces to documentation on the fulltext index that resides in namespace X of some ZIMs. Clicking on this link leads to https://wiki.openzim.org/wiki/ZIM_Index_Format, which is a mainly empty page "There is currently no text in this page". It would be useful to have some information about the format.

kelson42 commented 4 years ago

@Jaifroid The namespace is used, we use it currently for fulltext and title indexes using Xapian solution. That said, within the ZIM format, I'm not ready to specify more precisely how we do that. I want for example to allow the ZIM format to support other kinds of fulltext indexes.

Another point is that the namespaces will be removed really soon, so that part will be deprecated.

Then, the question would be to document how the fulltext indexes looks like from Xapian perspective. For that, I don't think this is really useful, nobody ever asked about it and this is anyway really easy to se how it works with Xapian tools (the Xapian database format is open).

An other info is that at the really beginning of the ZIM format, we had an attempt to have our own fulltext index. Something I have been strongly against (why reinventing the wheel) and which has been abandoned quickly. But you still have a few traces of this attempt with the red link you talked about (I have removed it).

All in one, I don't think this is worth it.

Jaifroid commented 4 years ago

OK, I understand. But the page that is linked to should ideally have something, and not be an empty page, e.g. "The fulltext index is currently implemented using a Xapian database, although other indexing technologies may be used in the future. A Xapian search returns a set of pointers to Directory Entries. Please see the Xapian specification at [URL]."

Regarding your comment that "namespaces will be removed really soon", @mossroy and I will need time to adapt our code. We have some testing on name spaces in our code to do binary search in Kiwix JS and to test for images and video. Remember that we don't currently support the Xapian fulltext search (hence this issue). The only search we have is the search using the Title Pointer List, so we will need time to be able to support a different solution.

kelson42 commented 4 years ago

@Jaifroid This page https://wiki.openzim.org/wiki/ZIM_Index_Format does not exist at all and has not link anymore pointing to it. ZIM format in itself has nothing to do with Xapian, so we won't write anything about Xapian on the openZIM wiki.

Regarding the removal of namespace, we will (1) Implement the feature (2) wait a bit to get the reader updated (3) generated massively new ZIM files.