kiwix / libkiwix

Common code base for all Kiwix ports
https://download.kiwix.org/release/libkiwix/
GNU General Public License v3.0
116 stars 55 forks source link

/catalog doesn't work without access to ZIM files #754

Open rgaudin opened 2 years ago

rgaudin commented 2 years ago

My understanding was that the catalog part of the server (ie. the OPDS engine) would only manipulate catalog-data and thus not require ZIM access. Its is not the case

wget download.kiwix.org/library/library_zim.xml
kiwix-serve --library --daemon -p 9999 ./library_zim.xml
curl localhost:9999/catalog/root.xml

Kiwix serve starts and loads the library properly (The library was successfully loaded.) but the OPDS requests all comes back empty

<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:opds="http://opds-spec.org/2010/catalog">
  <id>1ae99b6e-a67b-db46-157a-fcc82a42d3a8</id>
  <title>All zims</title>
  <updated>2022-04-22T12:22:02Z</updated>

  <link rel="self" href="" type="application/atom+xml" />
  <link rel="search" type="application/opensearchdescription+xml" href="/catalog/searchdescription.xml" />
</feed>

curl localhost:9999/catalog/search?lang=fra

<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:opds="http://opds-spec.org/2010/catalog">
  <id>d097adb8-1df3-d4f1-b77d-33f90d6b7793</id>
  <title>Filtered zims (lang=fra)</title>
  <updated>2022-04-22T12:23:04Z</updated>
  <totalResults>0</totalResults>
  <startIndex>0</startIndex>
  <itemsPerPage>0</itemsPerPage>
  <link rel="self" href="" type="application/atom+xml" />
  <link rel="search" type="application/opensearchdescription+xml" href="/catalog/searchdescription.xml" />
</feed>
kelson42 commented 2 years ago

@rgaudin How looks the Library XML? Do you have urls (to download the ZIM) in it?

rgaudin commented 2 years ago

The library XML is the production one ; didn't you see the wget call above? So yes, there's an url attribute on each book. Should that matter?

kelson42 commented 2 years ago

@mgautierfr @veloman-yunkan Definitly a blocker to https://github.com/kiwix/container-images/issues/147

rgaudin commented 2 years ago

Following @kelson42 suggestion, I removed the path attributes from all the books in the ZIM and I have a different startup output:

Loading the library from the following files:
    /library_zim.xml
The library was successfully loaded.
The XML library file '/library_zim.xml' is empty (or has only remote books).
The Kiwix server is running and can be accessed in the local network at: xxx

Though the result it still the same on OPDS endpoints

kelson42 commented 2 years ago

I also believe that if a path is given and the ZIM file can not be loaded, the current strategy is to ignore and continue. This looks right, but an ERROR/WARNING message should better be printed.

mgautierfr commented 2 years ago

This is something I've realize recently and comment in https://github.com/kiwix/libkiwix/issues/708#issuecomment-1095009085

Copying the important part:

[...] the catalog (root.xml, search, v2, ...) always returns books with local and valid zim files. And there is no way for now to have the list of remote books (ones with download link/url) whatever if they are local or not. It would be pretty easy to change (technically) but it add some functional complexity (API to define, kiwix-serve frontend assuming catalog returns books readable by kiwix-serve, ....)

rgaudin commented 2 years ago

From discussion with @mgautierfr and @kelson42:

The issue with implementing this is that kiwix-serve currently serves two purposes:

The ZIM browser on / is just an HTML shell with a JS app that queries the catalog on /catalog.

ZIM browsing could work with a zim-less catalog but should it ? If so, it could not offer links to the demo content as it currently does as it would not be able to serve it. Or in case of a mixed catalog with ZIM-backed and ZIM-less Books, it would not know which are available.

Solving this would mean updating the OPDS response to conditionally include a link to HTML content.

Another issue is that, because it is available in Kiwix serve we host those two services to the public at https://library.kiwix.org:

As the objective suggests, we should separate both services to have a dedicated ZIM-less OPDS catalog for ZIM readers and a dedicated ZIM-backed demo.

Keeping current URLs for both is not possible. Depending on how one understands “library” we could either:

I am in favor of the first one.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

kelson42 commented 1 year ago

@mgautierfr @veloman-yunkan Do we have here anything still to discuss before implementation?

rgaudin commented 1 year ago

@mgautierfr @veloman-yunkan Do we have here anything still to discuss before implementation?

Link to the browse-able content is not sorted (<link type="text/html" href="/content/lilote_fr_test_2023-01" />)

Would also be good to sort-out how we'll want to handle multiple illustrations to know whether this will problematic or not once we get there.