kiwix / kiwix-desktop

Kiwix for Windows and GNU/Linux desktops
https://download.kiwix.org/release/kiwix-desktop/
GNU General Public License v3.0
777 stars 102 forks source link

Searching with non-latin characters in English sources is not working #1223

Open sananjalka opened 1 month ago

sananjalka commented 1 month ago

Hello.

I am on Kiwix 2.3.1. on Linux MInt 21.2

My problem is that in the zim files of English Wikipedia or English Wiktionary searching any word with any non-latin characters returns no results. This applies to searching inside the application and in server mode in browser.

Searching with non-latin characters works however for example in Finnish or Swedish language zim files.

kelson42 commented 1 month ago

@sananjalka Can you please share an example, what you get and what you expect?

sananjalka commented 1 month ago

Allright, I started testing this to screenshot and show examples and I ran into interesting behaviour. If I for example quickly type "Düsseldorf" (where the non-latin letter is the "ü", and press enter, I get the following screen:

Image

But if I first type "Düsseldorf" and wait a couple of seconds, I get suggestions in the search bar:

Image

If I then press enter after getting the suggestions, I get into the wikipedia page "Düsseldorf".

I did not expect to have to wait after typing before pressing enter to get results. Especially when this behaviour does not happen when searching for English-language words with latin-only letters. Is this a known phenomenon?

sananjalka commented 1 month ago

After this I noticed that even if I wait for those suggestions to show up, selecting "Düsseldorf (Fulltext search) yields no results. But if I type "London", select "London (Fulltext search) and press enter, I get a list of pages that contain the word "London" (although the list is not very long, so it possibly can't contain all the pages containing that word.

sananjalka commented 1 month ago

Testing in Wiktionary-En:

If I enter the search string "head" and immediately press enter, I get a full-text search page of articles which contain the word "head", including the article "head".

If I enter the search string "head" and wait a couple of seconds, I get the following recommendations, and if I press enter after those having shown up, I get directly into the article "Head". Image

On the other hand, if I enter the search string "pää" and immediately press enter, I get the following result: Image

If I enter the search string "pää" and wait a couple of seconds, I get the following recommendations: Image First of all, they display the letter "ä" as some kind of code. But even dismissing that, interestingly enough, the article "pää" (which exists) is not in that recommendation list at all, even though it would be the most accurate and simple result. In the recommendation list are idioms that contain the word "pää", but not the article for the word "pää" itself.

As with Wikipedia-En, selecting the fulltext search option from the recommendations for the word with non-latin letters yields no results.