kiwix / libkiwix

Common code base for all Kiwix ports
https://download.kiwix.org/release/libkiwix/
GNU General Public License v3.0
117 stars 55 forks source link

Strange internal server errors when searching on a ZIM file served via library.xml #803

Open veloman-yunkan opened 2 years ago

veloman-yunkan commented 2 years ago

There are strange internal server errors for certain search patterns when a ZIM file is served via library.xml (but not when it is served directly).

A reproducing test case (must be run in the libkiwix repository):


$ cat search_test 
#!/usr/bin/env bash

search_patterns=(abcde abcd abc ab a ab abc '<b>' '<b>c' '<b>cd' 'a<b>cd' 'a<bcd' 'a*bcd' 'a*bc' 'a*b' 'xyz<b>cd' 'x<b>cd' '<b>cd' '<bcd' '<b>cd' '/bcd' '/b>cd' 'ab>cd')

perform_search_requests()
{
  echo ------- Start of /search endpoint testing
  for p in "${search_patterns[@]}"
  do
    echo "Searching for: $p"
    curl -Is "http://localhost:8080/search?content=zimfile&pattern=$p"|head -1
  done
  echo ------- End of /search endpoint testing
  echo
  echo
}

set -x
kiwix-serve -p 8080 test/data/zimfile.zim &> /dev/null & server_pid=$!
set +x
sleep 1
echo
echo Serving the ZIM file directly
echo

perform_search_requests

kill "$server_pid"
wait "$server_pid"

set -x
cat lib.xml
kiwix-serve -p 8080 --library lib.xml &> /dev/null & server_pid=$!
set +x
sleep 1
echo
echo Serving the ZIM file via library
echo

perform_search_requests

kill "$server_pid"
wait "$server_pid"

$ PATH=$PATH:../../BUILD_native_static/kiwix-tools/src/server ./search_test
+ server_pid=348662
+ set +x
+ kiwix-serve -p 8080 test/data/zimfile.zim

Serving the ZIM file directly

------- Start of /search endpoint testing
Searching for: abcde
HTTP/1.1 200 OK
Searching for: abcd
HTTP/1.1 200 OK
Searching for: abc
HTTP/1.1 200 OK
Searching for: ab
HTTP/1.1 200 OK
Searching for: a
HTTP/1.1 200 OK
Searching for: ab
HTTP/1.1 200 OK
Searching for: abc
HTTP/1.1 200 OK
Searching for: <b>
HTTP/1.1 200 OK
Searching for: <b>c
HTTP/1.1 200 OK
Searching for: <b>cd
HTTP/1.1 200 OK
Searching for: a<b>cd
HTTP/1.1 200 OK
Searching for: a<bcd
HTTP/1.1 200 OK
Searching for: a*bcd
HTTP/1.1 200 OK
Searching for: a*bc
HTTP/1.1 200 OK
Searching for: a*b
HTTP/1.1 200 OK
Searching for: xyz<b>cd
HTTP/1.1 200 OK
Searching for: x<b>cd
HTTP/1.1 200 OK
Searching for: <b>cd
HTTP/1.1 200 OK
Searching for: <bcd
HTTP/1.1 200 OK
Searching for: <b>cd
HTTP/1.1 200 OK
Searching for: /bcd
HTTP/1.1 200 OK
Searching for: /b>cd
HTTP/1.1 200 OK
Searching for: ab>cd
HTTP/1.1 200 OK
------- End of /search endpoint testing

+ cat lib.xml
<library version="1.0">
  <book
        id="raycharles"
        path="test/data/zimfile.zim"
      ></book>
</library>
+ server_pid=348738
+ set +x
+ kiwix-serve -p 8080 --library lib.xml

Serving the ZIM file via library

------- Start of /search endpoint testing
Searching for: abcde
HTTP/1.1 200 OK
Searching for: abcd
HTTP/1.1 200 OK
Searching for: abc
HTTP/1.1 500 Internal Server Error
Searching for: ab
HTTP/1.1 200 OK
Searching for: a
HTTP/1.1 500 Internal Server Error
Searching for: ab
HTTP/1.1 200 OK
Searching for: abc
HTTP/1.1 500 Internal Server Error
Searching for: <b>
HTTP/1.1 500 Internal Server Error
Searching for: <b>c
HTTP/1.1 500 Internal Server Error
Searching for: <b>cd
HTTP/1.1 500 Internal Server Error
Searching for: a<b>cd
HTTP/1.1 500 Internal Server Error
Searching for: a<bcd
HTTP/1.1 200 OK
Searching for: a*bcd
HTTP/1.1 200 OK
Searching for: a*bc
HTTP/1.1 200 OK
Searching for: a*b
HTTP/1.1 500 Internal Server Error
Searching for: xyz<b>cd
HTTP/1.1 200 OK
Searching for: x<b>cd
HTTP/1.1 200 OK
Searching for: <b>cd
HTTP/1.1 500 Internal Server Error
Searching for: <bcd
HTTP/1.1 200 OK
Searching for: <b>cd
HTTP/1.1 500 Internal Server Error
Searching for: /bcd
HTTP/1.1 200 OK
Searching for: /b>cd
HTTP/1.1 500 Internal Server Error
Searching for: ab>cd
HTTP/1.1 200 OK
------- End of /search endpoint testing
veloman-yunkan commented 2 years ago

The reason for the errors is that the ZIM id in library.xml doesn't match the actual UUID of the ZIM file. The error is only triggered when there is at least one search result satisfying the query (that's why some search patterns lead to a 200 OK HTTP status code - it's when there is no search result matching that query).

kelson42 commented 2 years ago

The reason for the errors is that the ZIM id in library.xml doesn't match the actual UUID of the ZIM file.

Do we have a bug in kiwix-manage? How do we come to such a sitiation? Manual Tweak of library.xml?

veloman-yunkan commented 2 years ago

@kelson42 Yes, it was a manually created library.xml file. More than a year ago I created one such file in test/data where the same ZIM file was entered three times with different ids. Such a setup became an issue after #729.

kelson42 commented 2 years ago

To me this seems to bring the questionning around #754. How to handle zim metadata with library.xml and with/without ZIM file avcess.