kiwix / libkiwix

Common code base for all Kiwix ports
https://download.kiwix.org/release/libkiwix/
GNU General Public License v3.0
118 stars 55 forks source link

(A) Search drop-down [incl 10 suggested article titles] (1) never appears, or (2) appears many seconds later, or (3) appears nearly instantly if you type BACKSPACE — or if you hit ENTER, ERR_CONNECTION_REFUSED is very common (B) SEARCH FAILS when search query is a Greek letter like "pi", "alpha", "beta" ETC #769

Closed holta closed 1 year ago

holta commented 2 years ago

This irritating problem (appears) to affect all schools and everyone using very large ZIM files (on Raspberry Pi's especially?)

Specifically, this severe UX glitch occurs very often with very large ZIM files like https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2021-12.zim

What happens is that users browse to type in a search query into the top-right (e.g. of http://box.lan/kiwix/wikipedia_en_all_maxi_2021-12/) but the search drop-down menu very often never appears.

Or the drop-down appears Many Seconds Later — for no obvious reason (zero CPU load, and many gigabytes of RAM available, even when no others users are using kiwix-serve).

The problem occurs with any version of kiwix-tools (kiwix-serve) from recent years (including the very latest 3.2.0-4).

The problem occurs regardless if the https://internet-in-a-box.org is proxying kiwix-serve or if kiwix-serve if being accessed directly over port 3000.

Very Strange Workarounds:

While very useful, not every teacher and student can handle the above quirky workarounds :/

Does anybody have any idea what the root cause might be — for this very common usability glitch? Apologies I'm not sure exactly what the pattern is! But certainly it occurs very consistently and commonly despite the intermittent/random nature of the problem — this seems to affect all schools using very large ZIM files under these common conditions (on Raspberry Pi servers, no matter how recent or how old, the delay does not seem the be affected very strangely...)

BACKGROUND: This longstanding problem has been ongoing in recent years. I told schools that faster CPU's, more RAM and newer OS's would probably speed things up, such that this irritating delay (or the drop-down never appearing at all!) might effectively cease to be a problem.

I Was 100% Wrong: this confusing problem remains just as severe today in May 2022, even with massive amounts of RAM, fast SSD's instead of microSD cards, the fastest Raspberry Pi — and no matter whether using 32-bit or 64-bit Raspberry Pi OS Lite.

SIDE EFFECT: Many/most users end up doing Full-Text Search by accident. As they don't realize there's any other option — i.e. users often hit "Enter" after typing in a search query — as a result of the "10 suggestions" drop-down never having appeared.

mgautierfr commented 2 years ago

The problem is probably io speed, not cpu or ram size.

On wikipedia_en_all_maxi_2020-08.zim the title xapian database is 2692521984 bytes so 2.5 GB. From the raspberry Pi4 benchmarcks, the USB storage throughput (with a SSD external drive) is about 364MB/s, so you need at least 7 seconds to load it in RAM. This IO is not visible (not CPU Load) And if you use a PI3, the speed is about 38 MB/s so 67s to load everything. Raspberry Pi is not cheap for nothing.

Xapian doesn't load all the database in memory. It loads a first part, start interpret it and load another part and so.... So the total time is probably less than 7 seconds but you need several seconds just for the IO. During this time, the frontend is waiting a answer and do not show the dropdown. It appears many second later, when the search complete and the frontend has results to show.

Typing BACKSPACE sometimes helps, to force the drop-down to finally appear, e.g. by shortening your search query from "AIDS" to "AID"

Xapian somehow have to load data for "AID" first to search (and load data) for "AIDS". So when you remove the S, the request for AID is faster as the AID "data" is already loaded. But the request for AIDS is not stop on the server. It is just discarded by the frontend.

Typing "S" to restore your original search query ("AIDS") often then works, to force the drop-down to finally appear

You do another request for "AIDS". All the data is already loaded so the request is fast.


What can we do ?

Recent works on libzim API and libkiwix cache the searchers and searches. It helps to keep the parsed data in memory. But the loading of the "raw" data in ram is not really impacted. Even if create a new searcher, we can assume that kernel know what pages are loaded in ram and doesn't load them again. And if the memory is full (or any other reason), the kernel will discard pages even if a searcher is in cache on our side.

Stop using cheap hardware :) (Yes, I know this is not the answer you want to hear, but it is an answer anyway)

We can stop using xapian database for suggestion. We introduce xapian database here to have better results. But if user never see the results, they are not better. But we would have to use the title sorted listing. So only title starting by searched term. And case-sensitive (except if add another case-insensitive listing in zim files along with listing/titleOrdered https://wiki.openzim.org/wiki/Search_indexes)

holta commented 2 years ago

On wikipedia_en_all_maxi_2020-08.zim the title xapian database is 2692521984 bytes so 2.5 GB.

@mgautierfr that's a really great explanation (series of explanations!)

Thank you.

Xapian's entire ~2.5 GB doesn't need to be loaded into memory as you say — but if indeed Xapian requires large subsets of that to be moved across the computer's internal bus(es) that would indeed explain a lot :thinking:

(And on the bright side, the better the pattern is understood, the more schools can try to adapt with such realities.)

But we would have to use the title sorted listing. So only title starting by searched term. And case-sensitive (except if add another case-insensitive listing in zim files along with listing/titleOrdered https://wiki.openzim.org/wiki/Search_indexes)

Certainly the list of 10 suggested article titles will never be perfect — however this evolves year-by-year, as we know from things like https://github.com/kiwix/kiwix-tools/issues/513 — and we will live with that, whatever's decided!

:+1:

holta commented 2 years ago

@mgautierfr this might be unrelated — but schools perceive this to be very much related:

Would you happen to know why about 30% of full-text searches (on Raspberry Pi, as described above, in essentially every school that uses them) fail completely, with the following message if using Chrome browser:

This site can’t be reached

box refused to connect.

Try:

Checking the connection
ERR_CONNECTION_REFUSED

Reloading the page a couple times often works, i.e. forcing a 2nd attempt, or a 3rd attempt.

This happens when there is no load on the Raspberry Pi 4 server, which has many GB of RAM available, and no other users during testing (which has reconfirmed this).

In any case the failures appear to occur very consistently about 30% of the time — at completely random intervals.

FYI this has been reconfirmed using unproxied URL's like the following example:

NOTE: during testing it's important to use a new search query (search string, a.k.a. search pattern) every time, to avoid caching of prior search results.

kelson42 commented 2 years ago

I propose to wait the libkiwix 10.2.0 release and test again with it. If it still fail, then we will try to investigate to get a clear reproduction case.

mgautierfr commented 2 years ago

I don't know why the connection is refused but there is another cause of slow down.

By default, kiwix-serve use only 4 threads to answer requests. If the thread pool is full, connection are accepted by the httpd library but directly put in "wait state" until a thread is freed. If you start searches with "A", "AI", "AID", "AIDS", you have filled your thread pool and no request can be handle until a search is completed.

I've just tested with several fulltext search (changing the pagination) on a zim file on a usb drive (for long io) and on the 6 requests with only 2 threads available, I've "succeed" to have one 500 error (to investigate) The 500 error is maybe reported as ERR_CONNECTION_REFUSED by the proxy ?

kelson42 commented 2 years ago

@mgautierfr We should implement https://github.com/kiwix/libkiwix/issues/395 to avoid one thread to be occuped too long and assure a better distribution of threads usage.

holta commented 2 years ago

Thank you for the explanations & suggestions!

Quick questions below:

If you start searches with "A", "AI", "AID", "AIDS", you have filled your thread pool and no request can be handle until a search is completed.

Do you know if that applies when the user types in their search string very slowly — does this effectively launch 4 Xapian Title Searches — i.e. using up all 4 threads?

Search strings longer than 4 letters might use up all threads if so, causing serious resource starvation — if slow typing really does exacerbate these problems ?

(e.g. Is it important not to pause between each letter while typing in a search string?)

I propose to wait the libkiwix 10.2.0 release and test again with it.

Good to know. Roughly when is that (likely) expected?

kelson42 commented 2 years ago

I propose to wait the libkiwix 10.2.0 release and test again with it.

Good to know. Roughly when is that (likely) expected?

I expect within a week.

holta commented 2 years ago

Just FYI all testing mentioned above was reconfirmed with kiwix-tools 3.2.0-4:

root@box:/opt/iiab/kiwix/bin# ./kiwix-serve --version
kiwix-tools 3.2.0

libkiwix 10.1.1
+ libzim 7.2.1
+ libxapian 1.4.18
+ libcurl 7.67.0
+ libmicrohttpd 0.9.72
+ libz 1.2.12
+ libicu 58.2.0
+ libpugixml 0.12.0

libzim 7.2.1
+ libzstd 1.5.2
+ liblzma 5.2.4
+ libxapian 1.4.18
+ libicu 58.2.0
kelson42 commented 2 years ago

@holta Would you be able please to provide an update with kiwix-tools 3.3.0? Does it works better?

holta commented 2 years ago

For the moment I cannot reproduce the search drop-down's severe slowness on Raspberry Pi 4 with these 2 different versions of kiwix-tools — those were kiwix-tools 3.2.0-1 from 2022-02-02...

# kiwix-serve --version

kiwix-tools 3.2.0

libkiwix 10.0.1
+ libzim 7.2.0
+ libxapian 1.4.18
+ libcurl 7.67.0
+ libmicrohttpd 0.9.72
+ libz 1.2.8
+ libicu 58.2.0
+ libpugixml 0.12.0

libzim 7.2.0
+ libzstd 1.5.1
+ liblzma 5.2.4
+ libxapian 1.4.18
+ libicu 58.2.0

And kiwix-tools 3.3.0 from 2022-06-15...

# kiwix-serve --version

kiwix-tools 3.3.0

libkiwix 11.0.0
+ libzim 7.2.2
+ libxapian 1.4.18
+ libcurl 7.67.0
+ libmicrohttpd 0.9.72
+ libz 1.2.12
+ libicu 58.2.0
+ libpugixml 0.12.0

libzim 7.2.2
+ libzstd 1.5.2
+ liblzma 5.2.4
+ libxapian 1.4.18
+ libicu 58.2.0

On the own hand this appears to be good news. On the other hand, I'd like to understand why I (and others) had so much trouble with kiwix-serve 3.2.0-4 back in early May. I'll try to do more tests in coming days to see if this can be better understood.

FYI both above tests used http://box/kiwix/wikipedia_en_all_maxi_2021-12/

kelson42 commented 2 years ago

@holta Do you use the same hardware (rpi+sd) as a few months ago?

holta commented 2 years ago

@holta Do you use the same hardware (rpi+sd) as a few months ago?

Yes.

Strangely I also cannot reproduce the slowness when using kiwix-tools 3.2.0-4 just as in early May.

But am using a different OS today: for the moment anyway I'm using the 32-bit Raspberry Pi OS on Raspberry Pi 4, whereas in early May I was using the 64-bit version of Raspberry Pi OS on Raspberry Pi 4.

So hypothetically the slowness flaw might be arising from using armhf builds of kiwix-tools on 64-bit Raspberry Pi OS?? (I'll investigate more in coming days.)

kelson42 commented 2 years ago

@holta Quite impatient to know more about your investigations :)

holta commented 2 years ago

@holta Quite impatient to know more about your investigations :)

The above symptoms are essentially/exactly as described in early May 2022, further up on this ticket (#769).

Here is an ADDITIONAL (RELATED?) ISSUE... that appears extremely similar: (but might have a different root cause?)

RECAP / CLARIFICATIONS:

kelson42 commented 2 years ago

@mgautierfr Do you have all the material you need to try a reproduction case?

kelson42 commented 1 year ago

@holta latest libzim/libkiwix/kiwix-tools are still not released yet, but will have to tackle this in the next months. Do you know at least if this still appear with latest nightly of kiwix-serve?

holta commented 1 year ago

quickly typing in the search query "aids" fails to display the search dropdown about 80% of the time

Quick Tests: I can't reproduce the above with kiwix-tools nightly build 2022-10-24, with the latest Raspberry Pi OS:

Search query "AIDS" is slow to appear, but appeared every time within about 5-10 seconds.

type out Greek letters e.g. "alpha" "beta" "omicron" etc (the letter "mu" is the only exception, among all 24 Greek letters). To be clear: the search drop-down did not appear

The above failure however DOES occur every time — EXAMPLE:

kelson42 commented 1 year ago

@mgautierfr Might that be that this ticket has been a duplicate of https://github.com/kiwix/kiwix-tools/issues/573? For latest "pi" stuff stringly suspect a stopword on steeming which fails, what donyou think?

kelson42 commented 1 year ago

@mgautierfr A feedback?

kelson42 commented 1 year ago

I had a look to the problem with "pi" and the problem is related to wrong json escaping, see this:

$ curl -s "http://127.0.0.1:8080/suggest?content=wikipedia_en_all_nopic_2022-01&term=pi" | cat -n
     1  [
     2    {
     3      "value" : "PI",
     4      "label" : "<b>PI</b>",
     5      "kind" : "path"
     6        , "path" : "A/PI"
     7    },
     8    {
     9      "value" : "Pi",
    10      "label" : "<b>Pi</b>",
    11      "kind" : "path"
    12        , "path" : "A/Pi"
    13    },
    14    {
    15      "value" : "Pi.",
    16      "label" : "<b>Pi</b>.",
    17      "kind" : "path"
    18        , "path" : "A/Pi."
    19    },
    20    {
    21      "value" : "Pí",
    22      "label" : "Pí",
    23      "kind" : "path"
    24        , "path" : "A/Pí"
    25    },
    26    {
    27      "value" : "\pi",
    28      "label" : "\<b>pi</b>",
    29      "kind" : "path"
    30        , "path" : "A/\pi"
    31    },
    32    {
    33      "value" : "E^pi-pi",
    34      "label" : "E^<b>pi</b>-<b>pi</b>",
    35      "kind" : "path"
    36        , "path" : "A/E^pi-pi"
    37    },
    38    {
    39      "value" : "PI 88788",
    40      "label" : "<b>PI</b> 88788",
    41      "kind" : "path"
    42        , "path" : "A/PI_88788"
    43    },
    44    {
    45      "value" : "PI-21858",
    46      "label" : "<b>PI</b>-21858",
    47      "kind" : "path"
    48        , "path" : "A/PI-21858"
    49    },
    50    {
    51      "value" : "PI-3K",
    52      "label" : "<b>PI</b>-3K",
    53      "kind" : "path"
    54        , "path" : "A/PI-3K"
    55    },
    56    {
    57      "value" : "Pi (1998)",
    58      "label" : "<b>Pi</b> (1998)",
    59      "kind" : "path"
    60        , "path" : "A/Pi_(1998)"
    61    },
    62    {
    63      "value" : "pi ",
    64      "label" : "containing 'pi'...",
    65      "kind" : "pattern"
    66      
    67    }
    68  ]

and here with json integrety check:

$ curl "http://127.0.0.1:8080/suggest?content=wikipedia_en_all_nopic_2022-01&term=pi" | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1317  100  1317    0     0  24687      0 --:--:-- --:--:-- --:--:-- 24849
parse error: Invalid escape at line 27, column 19

@veloman-yunkan Would you be able to quickly fit that (and adapt the test)? In general I wonder that we have this kind of bug, we don't use an external primitive to do the json escaping?!

kelson42 commented 1 year ago

@veloman-yunkan Any feedback? this seems to me to be a blocker for 12.0.0 release

kelson42 commented 1 year ago

Everything should works fine now