Closed nijazm closed 1 year ago
I don't understand this bug report. Can someone rephrase it please, https://github.com/kiwix/overview/blob/master/REPORT_BUG.md
This ticket is a follow-up of #587 after one bug was fixed by kiwix/libkiwix#859 exposing another unrelated problem.
The essence of the problem is as follows.
English wikipedia contains an article with title & (that redirects to Ampersand).
A user exploring the wikipedia_en_all
ZIM file via kiwix-serve
expects that entering the &
symbol in the ZIM viewer searchbox will suggest them a link leading to that article. Instead they are presented only with a suggestion to perform a full-text search for the text &
, which still doesn't produce any results.
As hypothesized in https://github.com/kiwix/kiwix-tools/issues/587#issuecomment-1354495921, the problem is that the ampersand symbol is treated as punctuation and is simply discarded during the creation of the title index as well as when running suggestion search on it.
Ideally, while building the title index we should handle article names consisting of a single symbol or word in a special way, letting those terms go into the title index as is despite any rules that drop punctuation and stopwords. Also we will have to enhance the suggestion search so that it accounts for such an addition to the title index.
@veloman-yunkan Thank you for the explanation and analysis. Do you know exactly which part of the code removes this? Is that related the stop words? Your proposal seems worth to be considered IMO. I believe this special handling here might be pretty independant of any special character but impacting any really short titles.
Do you know exactly which part of the code removes this?
@kelson42 No, I don't.
@mgautierfr If there is only stop word(s) OR punctions in a title we should keep them IMO. Does that make sense?
I would say that we try to clean the query (or the title to index). And if the clean query(/title) is empty then we use the original string instead of the cleaned one. We don't care about what the original string is composed of.
@mgautierfr Should we move this ticket to openzim/libzim
?
yes
Okay, looks like you were fixing something but unsuccessfully. I just tested yesterday's nightly version of kiwix desktop and kiwix tools on Windows 11. Now just shows fulltext search autocomplete result for & symbol and when I click on it, it says
No results were found for "&"
. In search box it shows containing '&'. The same happens in kiwix serve (web browsers) and kiwix desktop app. Tested with english wikipedia 2021-12. The only difference is that now titles containing & redirect properly (previously they did not), e.g.Me, Myself & Irene