Closed Spiritdude closed 4 years ago
@Spiritdude You mean, you need https://github.com/kiwix/kiwix-tools/issues/52 ? And you might already use https://wiki.kiwix.org/wiki/OPDS.
@kelson42 yes, something like that :-) (I did not see this before), but I just tried it on my running instances (/catalog/root.xml
)
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:opds="http://opds-spec.org/2010/catalog">
<id>544b0204-d642-d996-a1a3-42762d3adadf</id>
<title>All zims</title>
<updated>2020-02-27T13:09:23Z</updated>
<link rel="self" href="" type="application/atom+xml" />
<link rel="search" type="application/opensearchdescription+xml" href="catalog/searchdescription.xml" />
</feed>
and it does not give me any items in the library and no search results for 'test' (/catalog/search?q=test
):
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:opds="http://opds-spec.org/2010/catalog">
<id>c33b5a56-aaf0-16ff-4a52-731165cfdd66</id>
<title>Search result for test</title>
<updated>2020-02-27T13:00:18Z</updated>
<totalResults>0</totalResults>
<startIndex>0</startIndex>
<itemsPerPage>0</itemsPerPage>
<link rel="self" href="" type="application/atom+xml" />
<link rel="search" type="application/opensearchdescription+xml" href="catalog/searchdescription.xml" />
</feed>
but /search?pattern=test
gives me plenty of results (wikipedia, wiktionary, gutenberg).
% kiwix-serve --version
3.0.3
% kiwix-manage library.xml show
id: 26723433-053e-e6b0-3299-4f160485eae8
path: /mnt/disk1/Datasets.2/ZIM/wikipedia_en_all_maxi_2018-10.zim
url:
title: Wikipedia
name: kiwix.wikipedia_en_all
tags: wikipedia;_videos:no;_ftindex:yes;_pictures:yes;_details:yes
description: From Wikipedia, the free encyclopedia
creator: Wikipedia
date: 2018-10-24
articleCount: 5733761
mediaCount: 4680359
size: 83853668352 KB
id: aef63f70-c2d9-5698-7acf-133741a7f3d0
path: /mnt/disk1/Datasets.2/ZIM/wiktionary_en_all_maxi_2020-01.zim
...
% ps aux | grep kiwix
... kiwix-serve --threads 8 --library ./library.xml --address 127.0.0.1 --port 8081
Is there any other action I have to take to get OPDS working?
In the wiki of kiwix-serve
is no mention of OPDS, only in the wiki entry of OPDS it says it's a feature of kiwix-serve
; if this feature is built-in it as of 3.0.3 or later, might be worth to mention it in kiwix-serve
wiki page itself.
PS: /catalog/search?q=test&start=1
coredumps kiwix-serve
in my case with supposed no items in the library.
@Spiritdude Thank you for your feedbacks.
This ticket:
Because of all of that this is really difficult to run a constructive discussion. Do you think it could be possible for you to split this ticket in atomic topics (one per ticket), with for each the usual information which make a good ticket? We would be really happy to help you to achieve your goals and fix bugs on our side if necessary.
If you have as well a concrete project we might help with, I would be availble for a small video call.
@kelson42
kiwix-serve
to handle wikipedia and other sources I wanted to see how well kiwix-serve
Full Text Search works compared to postgresql::gin::trigram index.@Spiritdude OPDS has nothing to do with fulltext search. Your ticket talks as well about nginx...CORS... Here are the guidelines to open a new ticket: https://github.com/kiwix/overview/blob/master/REPORT_BUG.md
I have reported the missing ZIM file in OPDS bug here https://github.com/kiwix/kiwix-lib/issues/332. Please report any new problem or question in a separate ticket.
There is several things we want to address in kiwx-serve (among others):
/catalog/root.xml
for all books or /catalog/search?...
to search books using some criterions./search?pattern=test
returning a html pages with the articles list (so it is no really an api) and /suggest?...
returning search suggestions (but only for one book)And, we also need to mention /meta?...
to get a metadata associated to a book (try /meta?content=<bookName>&name=title
or /meta?content=<bookName>&name=favicon
)
What the ticket seems to be about is to have a restfull api to access all this (at least last two points). Please @Spiritdude confirms I'm right.
PS: /catalog/search?q=test&start=1 coredumps kiwix-serve in my case with supposed no items in the library.
This is definitively a bug. Pleas open a new issue (in kiwix-lib project)
if I search content specific (/search?content=wikipedia_en_all_maxi&pattern=test), kiwix-serve should not redirect directly to the article if there is a direct hit, but give me actual search hits
I agree with you. But that's a old behavior and last time I've seen this I didn't want to change this to avoid breaking potential user expectation. It may be time to change this. Please open a new issue for this.
context: I develop an indexing service with web-gui (like google but for local/on-premise storage), and I provide 3rd party sources like wikidata, stackexchange, wiktionary etc, and I usually put everything into postgresql using jsonb and index with gin::trigram to catch misspellings; using kiwix-serve to handle wikipedia and other sources I wanted to see how well kiwix-serve Full Text Search works compared to postgresql::gin::trigram index.
Spoiler alert : kiwix-serve full text search does not correct misspelling.
@mgautierfr yes, I confirm your guessing is what I wanted. I corresponded with @kelson42 by email, and sorted things out; we both misunderstood each other in the details (what feature exists and what does not, what "searching content" means vs "full-text search" etc) - in a nutshell, I assumed fulltext search is covered in OPDS which is not the case. As I struggled to compile kiwix-tools
myself I did not feel qualified to contribute direct here (like adding the RESTful API myself), so I began to write my own Perl5 module and command-line tool to do what I like to have, I just released it at https://github.com/Spiritdude/ZIM a few hours ago:
Right now I work on library support (collection of zim files) for the web-server like kiwix-serve
supports with --library
, and deal with >1GB large xapian indices; perhaps in a 1-2 days that should work as well to some degree, I need to get a sense how well xapian indices actually work.
if I search content specific (/search?content=wikipedia_en_all_maxi&pattern=test),
kiwix-serve
should not redirect directly to the article if there is a direct hit, but give me actual search hitsI agree with you. But that's a old behavior and last time I've seen this I didn't want to change this to avoid breaking potential user expectation. It may be time to change this. Please open a new issue for this.
Since I parsed the HTML for my first attempt doing RESTful full-text search, it meant I could not get any further results.
For me all is good, I write on zim/ZIM.pm more to see if ZIM files pre-indexes work well for me; I really like to have a local copy of Wikipedia running and your (kiwix.org) approach I like; I will either use kiwix-serve
or my own combo as soon it becomes reliable.
I corresponded with @kelson42 by email
This is bad :) Please keep things public.
As I struggled to compile kiwix-tools myself
Have you tried kiwix-build (https://github.com/kiwix/kiwix-build)
if I search content specific (/search?content=wikipedia_en_all_maxi&pattern=test), kiwix-serve should not redirect directly to the article if there is a direct hit, but give me actual search hits
I agree with you. But that's a old behavior and last time I've seen this I didn't want to change this to avoid breaking potential user expectation. It may be time to change this. Please open a new issue for this.
@mgautierfr You have reopen the ticket, but this is not clear to me why. What should we do next?
First of all - thanks to all who contributed to kiwix projects, and those creating those huge zim files - it's very appreciated.
The issue: I did not find a way to get structured search results (if there is, let me know), so I patched together:
nginx
as proxy to provide CORSkiwix-serve
as backendand XHR in a WebApp using JS, yet, the results (HTML) I need to parse line-wise (easy) or
html2json
(complex) to get the formated results - so far I've got it to work, see https://gist.github.com/Spiritdude/f2581402127af41577d3e3d4fbdefb0b for details how.Perhaps it's worthwhile to add RESTful interface by
kiwix-serve
direct:Access-Control-Allow-Origin: *
forGET
&OPTIONS
request&format=json
to/search?pattern=test
or consider"Content-Type: application/json
in the request header, to provide structured results (see my gist as reference e.g.{ link: "...", title: "...", cite: "..." }
per hit) or consider opensearch notion as mentioned at https://github.com/kiwix/kiwix-tools/issues/52/search?content=wikipedia_en_all_maxi&pattern=test
),kiwix-serve
should not redirect directly to the article if there is a direct hit, but give me actual search hitshttp://127.0.0.1:8081/?format=json
or alternatively considerContent-Type: application/json
this way to integrate worthwhile search results to other services.
Hinting to https://github.com/kiwix/kiwix-tools/issues/345 as well, to speed up search by omitting snippets, and allow client to request it separately.