kiwix / kiwix-tools

Command line Kiwix tools: kiwix-serve, kiwix-manage, ...
https://download.kiwix.org/release/kiwix-tools/
GNU General Public License v3.0
435 stars 85 forks source link

RESTful API? #368

Closed Spiritdude closed 4 years ago

Spiritdude commented 4 years ago

First of all - thanks to all who contributed to kiwix projects, and those creating those huge zim files - it's very appreciated.

The issue: I did not find a way to get structured search results (if there is, let me know), so I patched together:

and XHR in a WebApp using JS, yet, the results (HTML) I need to parse line-wise (easy) or html2json (complex) to get the formated results - so far I've got it to work, see https://gist.github.com/Spiritdude/f2581402127af41577d3e3d4fbdefb0b for details how.

Perhaps it's worthwhile to add RESTful interface by kiwix-serve direct:

this way to integrate worthwhile search results to other services.

Hinting to https://github.com/kiwix/kiwix-tools/issues/345 as well, to speed up search by omitting snippets, and allow client to request it separately.

kelson42 commented 4 years ago

@Spiritdude You mean, you need https://github.com/kiwix/kiwix-tools/issues/52 ? And you might already use https://wiki.kiwix.org/wiki/OPDS.

Spiritdude commented 4 years ago

@kelson42 yes, something like that :-) (I did not see this before), but I just tried it on my running instances (/catalog/root.xml)

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:opds="http://opds-spec.org/2010/catalog">
  <id>544b0204-d642-d996-a1a3-42762d3adadf</id>
  <title>All zims</title>
  <updated>2020-02-27T13:09:23Z</updated>
  <link rel="self" href="" type="application/atom+xml" />
  <link rel="search" type="application/opensearchdescription+xml" href="catalog/searchdescription.xml" />
</feed>

and it does not give me any items in the library and no search results for 'test' (/catalog/search?q=test):

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:opds="http://opds-spec.org/2010/catalog">
  <id>c33b5a56-aaf0-16ff-4a52-731165cfdd66</id>
  <title>Search result for test</title>
  <updated>2020-02-27T13:00:18Z</updated>
  <totalResults>0</totalResults>
  <startIndex>0</startIndex>
  <itemsPerPage>0</itemsPerPage>
  <link rel="self" href="" type="application/atom+xml" />
  <link rel="search" type="application/opensearchdescription+xml" href="catalog/searchdescription.xml" />
</feed>

but /search?pattern=test gives me plenty of results (wikipedia, wiktionary, gutenberg).

% kiwix-serve --version
3.0.3

% kiwix-manage library.xml show
id:             26723433-053e-e6b0-3299-4f160485eae8
path:           /mnt/disk1/Datasets.2/ZIM/wikipedia_en_all_maxi_2018-10.zim
url:
title:          Wikipedia
name:           kiwix.wikipedia_en_all
tags:           wikipedia;_videos:no;_ftindex:yes;_pictures:yes;_details:yes
description:    From Wikipedia, the free encyclopedia
creator:        Wikipedia
date:           2018-10-24
articleCount:   5733761
mediaCount:     4680359
size:           83853668352 KB

id:             aef63f70-c2d9-5698-7acf-133741a7f3d0
path:           /mnt/disk1/Datasets.2/ZIM/wiktionary_en_all_maxi_2020-01.zim
...

% ps aux | grep kiwix
... kiwix-serve --threads 8 --library ./library.xml --address 127.0.0.1 --port 8081

Is there any other action I have to take to get OPDS working?

In the wiki of kiwix-serve is no mention of OPDS, only in the wiki entry of OPDS it says it's a feature of kiwix-serve; if this feature is built-in it as of 3.0.3 or later, might be worth to mention it in kiwix-serve wiki page itself.

PS: /catalog/search?q=test&start=1 coredumps kiwix-serve in my case with supposed no items in the library.

kelson42 commented 4 years ago

@Spiritdude Thank you for your feedbacks.

This ticket:

Because of all of that this is really difficult to run a constructive discussion. Do you think it could be possible for you to split this ticket in atomic topics (one per ticket), with for each the usual information which make a good ticket? We would be really happy to help you to achieve your goals and fix bugs on our side if necessary.

If you have as well a concrete project we might help with, I would be availble for a small video call.

Spiritdude commented 4 years ago

@kelson42

kelson42 commented 4 years ago

@Spiritdude OPDS has nothing to do with fulltext search. Your ticket talks as well about nginx...CORS... Here are the guidelines to open a new ticket: https://github.com/kiwix/overview/blob/master/REPORT_BUG.md

kelson42 commented 4 years ago

I have reported the missing ZIM file in OPDS bug here https://github.com/kiwix/kiwix-lib/issues/332. Please report any new problem or question in a separate ticket.

mgautierfr commented 4 years ago

There is several things we want to address in kiwx-serve (among others):

And, we also need to mention /meta?... to get a metadata associated to a book (try /meta?content=<bookName>&name=title or /meta?content=<bookName>&name=favicon)

What the ticket seems to be about is to have a restfull api to access all this (at least last two points). Please @Spiritdude confirms I'm right.


PS: /catalog/search?q=test&start=1 coredumps kiwix-serve in my case with supposed no items in the library.

This is definitively a bug. Pleas open a new issue (in kiwix-lib project)

if I search content specific (/search?content=wikipedia_en_all_maxi&pattern=test), kiwix-serve should not redirect directly to the article if there is a direct hit, but give me actual search hits

I agree with you. But that's a old behavior and last time I've seen this I didn't want to change this to avoid breaking potential user expectation. It may be time to change this. Please open a new issue for this.

context: I develop an indexing service with web-gui (like google but for local/on-premise storage), and I provide 3rd party sources like wikidata, stackexchange, wiktionary etc, and I usually put everything into postgresql using jsonb and index with gin::trigram to catch misspellings; using kiwix-serve to handle wikipedia and other sources I wanted to see how well kiwix-serve Full Text Search works compared to postgresql::gin::trigram index.

Spoiler alert : kiwix-serve full text search does not correct misspelling.

Spiritdude commented 4 years ago

@mgautierfr yes, I confirm your guessing is what I wanted. I corresponded with @kelson42 by email, and sorted things out; we both misunderstood each other in the details (what feature exists and what does not, what "searching content" means vs "full-text search" etc) - in a nutshell, I assumed fulltext search is covered in OPDS which is not the case. As I struggled to compile kiwix-tools myself I did not feel qualified to contribute direct here (like adding the RESTful API myself), so I began to write my own Perl5 module and command-line tool to do what I like to have, I just released it at https://github.com/Spiritdude/ZIM a few hours ago:

Right now I work on library support (collection of zim files) for the web-server like kiwix-serve supports with --library, and deal with >1GB large xapian indices; perhaps in a 1-2 days that should work as well to some degree, I need to get a sense how well xapian indices actually work.

if I search content specific (/search?content=wikipedia_en_all_maxi&pattern=test), kiwix-serve should not redirect directly to the article if there is a direct hit, but give me actual search hits

I agree with you. But that's a old behavior and last time I've seen this I didn't want to change this to avoid breaking potential user expectation. It may be time to change this. Please open a new issue for this.

Since I parsed the HTML for my first attempt doing RESTful full-text search, it meant I could not get any further results.

For me all is good, I write on zim/ZIM.pm more to see if ZIM files pre-indexes work well for me; I really like to have a local copy of Wikipedia running and your (kiwix.org) approach I like; I will either use kiwix-serve or my own combo as soon it becomes reliable.

mgautierfr commented 4 years ago

I corresponded with @kelson42 by email

This is bad :) Please keep things public.

As I struggled to compile kiwix-tools myself

Have you tried kiwix-build (https://github.com/kiwix/kiwix-build)

if I search content specific (/search?content=wikipedia_en_all_maxi&pattern=test), kiwix-serve should not redirect directly to the article if there is a direct hit, but give me actual search hits

I agree with you. But that's a old behavior and last time I've seen this I didn't want to change this to avoid breaking potential user expectation. It may be time to change this. Please open a new issue for this.

See https://github.com/kiwix/kiwix-tools/issues/205

kelson42 commented 4 years ago

@mgautierfr You have reopen the ticket, but this is not clear to me why. What should we do next?