kiwix / libkiwix

Common code base for all Kiwix ports
https://download.kiwix.org/release/libkiwix/
GNU General Public License v3.0
119 stars 56 forks source link

Abbreviations cannot be searched anymore #196

Closed Imarok closed 5 years ago

Imarok commented 7 years ago

I'm using Kiwix 2.2 Searching for abbreviations (like FFT) doesn't work anymore. Some versions ago (2.1 or 1.97) it worked properly. Maybe refs #66 (Maybe Disambiguation sites not working anymore is the underlying reason)

kelson42 commented 7 years ago

What is the content you use? What do you mean exactly with "doesn't work anymore"? Do you have the fulltext search activated?

Imarok commented 7 years ago

I use the german wikipedia from 2016-08-16. "Searching for abbreviations (like FFT) doesn't work anymore." means it just doesn't yield any results for this abbrevation. (before it found the article FFT when searching for FFT) Enabling/Disabling fulltext search doesn't change the issue. (Furthermore it seems fulltext search doesn't work at all)

okkebal commented 7 years ago

Removed from 2.3.....

Cant find. "IBM" on dutch wikipedia...

For me this is the most important bug

kelson42 commented 7 years ago

@okkebal Which version of the file do you have? date?

okkebal commented 7 years ago

2017-08-09

okkebal commented 7 years ago

You can find "international business machines"

okkebal commented 7 years ago

Other searches that yield no results on the latest Dutch wikipedia:

EMMC - should yield article about memory cards SSD - solid state drived are in the Dutch wikipedia KNSB - this is the Dutch skate and chess clubs

kelson42 commented 7 years ago

@okkebal This should work fine now indeed. I'll try to reproduce the problem quickly. Please do not hesitate to ping me again on this soon, if nothing change on that ticket.

kelson42 commented 7 years ago

@mgautierfr @mhutti1 Kind of strange bug... you can reproduce it? Have an explaination?

mhutti1 commented 7 years ago

Search is not on the android side so I don't know.

okkebal commented 7 years ago

http://download.kiwix.org/nightly/2017-08-16/ Im using this nightly, later folders do not contain an android version. Perhaps you think I am using the 2.3 beta from the Google Play program but I am not. Just so you know. If you want me to try some build please provide an APK

okkebal commented 7 years ago

Can someone please add this to the 2.3 milestone. Without this Kiwix to me is pretty useless. More hassling with huge, outdated index files to work around this bug. Im about ready to buy an IOS device, but storage is at a premium at Apple, and these devices are like 3x overpriced. So how do you see this playing out in Africa? Just tell the locals to buy the latest Apple hardware?

okkebal commented 7 years ago

Some more description of this problem here: https://sourceforge.net/p/kiwix/bugs/953/

Although that bug is closed because I managed to finally get the indexes to work and work around the bug. Its crazy that without an extra index you have to exactly type "author: Nicola Tesla" to find the article on the famous scientist: so to me that bug should not have closed.

PS: I cannot test the latest nightlies because I suffer from the "get content" crash bug wich immediatly crashes the build. The problem is still in Kiwix 2.2 with WikiSource 2017-07

okkebal commented 7 years ago

I manged to install the latest nightly, 21 sept 2017. I directly opened a zim because of the Get Content crash bug. I can verfy that the latest Dutch Wikipedia still cant find emmc knsb, ibm or any of the other abbreviatoons

okkebal commented 7 years ago

@kelson42 Please look at this issue

I've read somewhere that Kiwix cannot use internal ZIM indexes if the file is split up. Question: do new ZIM files contain an index jet? What clients support this index? Is there a Windows beta version that works?

I've re-downloaded the ZIM files sothat the files are not in parts, tried the latest nightly (arm64), but the index does not work, nor does search work.

This problem is most visible with Wikisource, where you cannot find anything at all. Just try to search any person, author or whatever. No results, or wrong results. The search should be: $SQL = "Select from database where title like '%$searchword%'";

Or at least that is what it would be if it was SQL and php, but I hope you understand what I mean. The searchword that appears in the title is never found, but it should be!

okkebal commented 7 years ago

In recent builds (nightlies) search is even more broken. This can be seen in WikiSource (latest EN), if you search "a" you will get results. If you search "b" nothing happens, so you can only search stuff that starts with the letter "a"...

mgautierfr commented 7 years ago

[...] in WikiSource (latest EN), if you search "a" you will get results. If you search "b" nothing happens, [...]

I confirm this using kiwix-serve. It is probably not a kiwix-android problem but a libzim or kiwix-lib one.

I'm investigating

Is there a Windows beta version that works?

No, we are facing a dead end with windows version as xulrunner is abandoned. We need to rewrite the whole application with a new UI system :/

mgautierfr commented 7 years ago

https://github.com/openzim/libzim/pull/60 should fix some of the issues here.

However, this fixes bugs introduced recently (august 2017). Older bugs are not especially addressed. (But a lot of changes have been made on this point recently, maybe they are fixed)

(And I'm testing using kiwix-serve, not with kiwix-android)

okkebal commented 6 years ago

In recent nightlies of kiwix-android, search does not work better, still no abbreviations are found. Still can't find "IBM" on the dutch Wikipedia.

I see stuff about "find" in openzim/libzim#60 wich I guess means that the app will only search in title names that start with the keyword. This is not correct behavior. It should be " find *". If on WikiSource you search for 'Tesla' you especially want to find the page called 'author: Nicola Tesla'. The page title does not start with 'tesla' so it is never found.

okkebal commented 6 years ago

I see lots of activity on the other bugs. But not here. I personally do not use Kiwix anymore due to this bug. I use Aarddict 2.0, just because I cannot find anything on any zim file. The only way to find an atricle is to first find it in Aarddict, then look at the exact title for the article, then go to Kiwix and type in that exact title to find the article. If you do not have the extact title most articles cannot be found. Wikipedia works a bit better because there are so many disambiguation pages with short titles.

mgautierfr commented 6 years ago

A lot of works have been made on the low level kiwix-lib and libzim libraries. With the last version of kiwix-serve and wikipedia_de_all_nopic_2017-06.zim, I'm able to search for abbriviations (FFT, IBM, EMMC, SSD ...).

However, on kiwix-android , we've got some annoying bugs preventing us to do a new release with the last version of the libraries. (Especially this one https://github.com/kiwix/kiwix-android/issues/240).

It seems that @mhutti1 found a solution for the #240 so, hopefully, a new version of kiwix-android will be available soon. (but I will let @mhutti1 and @kelson42 confirm that, they are more aware of the recent bugs on android).

Anyway, your bug seems to be fixed already and we are blocked by other problems.

Maybe you can test with kiwix-serve (http://download.kiwix.org/nightly/2017-11-27/kiwix-tools_linux64_2017-11-27.tar.gz) and your zim and confirm that you can search abbreviations correctly. If it is not the case, please open a issue in https://github.com/kiwix/kiwix-tools repository.

kelson42 commented 6 years ago

@okkebal I can confirm that with the most recent kiwix-serve (nighly) and the last Wikipedia in English. It seems to work quite well. See our demonstration instance: http://library.kiwix.org/wikipedia_en_all_novid_2017-08/A/Fast_Fourier_transform.html

kelson42 commented 6 years ago

@okkebal I have just published the APK of last maintenance release 2.3 here http://download.kiwix.org/bin/android/2.3/. Give a try, but I think everything should work fine with them (and recent ZIM files).

okkebal commented 6 years ago

Thanks for the new binaries. I gave them a try

I've tried: http://download.kiwix.org/bin/android/2.3/kiwix-2.3_arm64-v8a.apk

Sorry but I see no improvement.

Cannot find 'author: Nicola Tesla' on searching 'tesla' in Wikisource (EN 2017-07-05)

Cannot find 'ibm' or 'emmc' or 'knsb' on Dutch Wikipedia (2017-08-09).

mhutti1 commented 6 years ago

@okkebal Have you tried with the fulltext search option on in settings?

kelson42 commented 6 years ago

I think we should close that ticket. It's not a problem with the software, it's a problem that (1) fulltext search is not generalised (2) maybe in the past it was not working well, but now it is. Users will stop to get lost at the time all the ZIM files will have a fulltext index and the option will be activated per default.

okkebal commented 6 years ago

I've also tried the 32 bits version here: http://download.kiwix.org/bin/android/2.3/kiwix-2.3_armeabi.apk

For some reason it has the Get Content crash bug. #240 When I install the 32 bits version it crashes, when I then install the 64 bits version and it does not crash (on get content). Install the 32 bits version again... it crashes again... very strange.

I can select a ZIM file through my file manager, but still no improvement on searches with the 32 bits version.

PS: My ZIM files are in split&concat files .zimaa etc..

@mhutti1 - alas there is no such setting in the new build....

mhutti1 commented 6 years ago

@okkebal Oh yeah sorry its enabled by default now.

kelson42 commented 6 years ago

@okkebal It is simply a ZIM file without ft index, so all what you reported recently regarding search makes sense to me. We just need to make all ZIM files with ft index.

okkebal commented 6 years ago

PS: I do not have any indexes installed (.idx folders) next to my ZIM files.

okkebal commented 6 years ago

@kelson42 So normal title searching without indexes remains broken??

We just need to make all ZIM files with ft index.

Nono we need to create StackOverflow :)

mhutti1 commented 6 years ago

@kelson42 So normal title searching without indexes remains broken??

I think the answer is instead that it works with fulltext but doesn't work without. I am not quite sure why though.

kelson42 commented 6 years ago

@mhutti1 yes you are right, this is simply weach by design.

mgautierfr commented 6 years ago

@mhutti1, which method from kiwix-lib are you using ?

The search method do a search in the fulltext index (if available). The searchSuggestions method do a search in the title using the fulltext index (if available) or fallback on the "old" search in the title (title starting with the searched pattern).

mhutti1 commented 6 years ago

@mgautierfr We use search suggestions in 2.3. In 3.0 we now use search and fallback to search suggestsions? is this the right behaviour?

okkebal commented 6 years ago

Right, so right now there is not a single Zim file that has an index included right? How do I know if a Zim file has an index? Will I see that in the filename? Will it take months to generate these new files? Or is there some way to put an existing idx into a zim file without having to regenerate them so we can have these files ready tomorrow? By closing this bug again we might lose sight of this bug again.

mgautierfr commented 6 years ago

In 3.0 we now use search and fallback to search suggestsions? is this the right behaviour?

Well, it depends of what you want to do :) If you want to search in the titles (to have the searched pattern in the title), you should use searchSuggestions. If you want to search in the whole content (an article related to the search patttern), you should use search.

mhutti1 commented 6 years ago

So should the correct behaviour be to use both? If so could we possibly rename these methods to make them clearer what they do?

kelson42 commented 6 years ago

@mhutti1 In case of kiwix-android, because you have no way to easily provide both kind of searches, I think you should use the "search" method if ft index available, "searchSuggestions" otherwise.

cipo7741 commented 5 years ago

Why is it closed? :cry:

My Grandma has a very old kiwix running, were the search works fine. We never updated, since we noticed that she wouldn't be able to look for "USA". We want to save the time and space to create a full-text index. We just wanna find the title.

26

893

134

cipo7741 commented 5 years ago

Off topic: love kiwix-android, great job. :+1:

mhutti1 commented 5 years ago

I think the answer is that the C++ (kiwix lib) doesn't support this anymore and users are unfortunately expected to use newer ZIM files.

okkebal commented 5 years ago

Why is this issue closed? This has not been fixed

MennovdHurk commented 5 years ago

There is some confusion: Yes the new indexed ZIM file work in the latest 2.4 version You can actually find "IBM" etc. But for some reason I cannot search "IBM" on the 2019 wikipedia 100.000 articles version. I guess these ZIM files do not have an index??

kelson42 commented 5 years ago

I reopen it and will move it to the kiwix-lib. @mhutti1 It important if this situation happens to provide an alternative ticket to which people can rely on (dev included).

mhutti1 commented 5 years ago

@kelson42 I didn't close this you did :P

kelson42 commented 5 years ago

@mhutti1 You are right, sorry wrong move.