openzim / node-libzim

Libzim binding for Node.js: read/write ZIM files in Javascript
https://www.npmjs.com/package/@openzim/libzim
GNU General Public License v3.0
27 stars 11 forks source link

Provide API to allow to avoid fulltext index #34

Open kelson42 opened 4 years ago

kelson42 commented 4 years ago

Currently this seems to somehow be implicitly done based on the fact of a language is given or not for the indexing. This sounds wrong because you need the language for the stemming of the Xapian Title index.

kelvinhammond commented 4 years ago

@kelson42 @mgautierfr Will setting shouldIndex = false avoid the full text index or will this break other things too? See Line 112 and Line 132.

I'm not sure what Line 112 data->nbIndexArticles++; does / is used for.

It also seems like setting withIndex to false would handle this and is probably the proper way to do this.

There is a setIndexing. Currently this is set to false if fullTextLanguage.empty(), we could also expose the setIndexing function on the ZimCreatorWrapper in javascript and typescript thus allowing a script to customize indexing and / or disable it. However based on this Line 85 a user may need to do this before they start writing because it appears the creator will have been created by then and indexing started. But now that I look again it probably would skip part of the indexing step since withIndex would be false later so this may work. Please confirm. I can write the code for this if this is the solution.

kelson42 commented 4 years ago

Currently this is set to false if fullTextLanguage.empty()

Yes, this is the part which is wrong. Probably should be fixed within #36.

mgautierfr commented 4 years ago

Line 112 is just about logging information.

shouldIndex methods is about "Should we index the article to allow the user to search for it ?". The answer is based on the "type" of the article, not on the fact that you want fulltext index or not. Probably that you want to index html article and not image/css/js ones. But it is not always the case. Some html articles may not have to be indexed for some reason. Or you may want to allow user to search for images (but it is not technically possible for now).

If you want to deactive fulltext index, you must use Creator::setIndexing(bool indexing, std::string language). The function name is not best here. If indexing is false, the fulltext indexing is deactivated (what ever is Article::shouldIndex). Title indexing is always activated (if Article::shouldIndex return true. And if libzim is compiled with xapian).