openzim / libzim

Reference implementation of the ZIM specification
https://download.openzim.org/release/libzim/
GNU General Public License v2.0
169 stars 50 forks source link

Offer search term spelling corrections #731

Open kelson42 opened 2 years ago

kelson42 commented 2 years ago

This is a common feature of mean free text search engines and this can be helpful.

Xapian provides a core feature for that https://docs.huihoo.com/xapian/docs/spelling.html

Original ticket on Sourceforge https://sourceforge.net/p/kiwix/bugs/849/

Download the Kiwix application for android and I installed it. I
downloaded the Wiktionary in Spanish, unzipped it and upload it to the
external memory of the smart phone. Since I read the application file
and all is well.

But when I write the wrong word Wiktionary does not correct
me. Example: in Spanish is written: ZAPATO. If SAPATO write the
application tells me: Error: "failed load SAPATO article", but does
not correct me should show me the options. You mean I have to be an
expert in the language to find, does not help me that way because the
objective is to correct me when I'm wrong.

If I do the same on the computer shows me as options:
1- Zapato
2 - Calzado
3 - Pasta de zapatos
4 - It is possible to improve the application for android?
5 - I'm failing at something?
gremid commented 2 months ago

Here is a quick proof-of-concept in Python, showing that Xapian's builtin functionality would cover some common misspellings as conducted by people learning German, either as their first or as a second language.

https://github.com/gremid/xapian-spelling-suggestions/

Two changes two libzim's index code would be necessary:

  1. During indexing the title of a ZIM entry has to be added to a spelling dictionary which is later used for lookups.
  2. During retrieval and in case that there are no results for a given (exact) query, the spelling dictionary would be queried for suggestions.