kiwix / libkiwix

Common code base for all Kiwix ports
https://download.kiwix.org/release/libkiwix/
GNU General Public License v3.0
120 stars 57 forks source link

Should we expose the bookName? #864

Open rgaudin opened 1 year ago

rgaudin commented 1 year ago

Following https://github.com/kiwix/kiwix-tools/pull/586, it is clear that the bookName is an important information for Kiwix-serve users/integrators. While the v2 API is now completely reliant on UUIDs, reader endpoints like /content, /raw, /search, /suggest and /viewer are dependent on that bookName.

bookName is a human-friendly book identifier in the internal catalog. It is built from the source ZIM filename, essentially normalizing it. On our public catalog, those mostly matches the Name metadata of the ZIM because that's our convention but it doesn't have to be.

Given it's an identifier, I wonder if we should expose it in the OPDS API.

❯ curl https://library.kiwix.org/catalog/v2/entry/5f93c4a9-2ebd-ddd0-d13d-1edc29d511a6

<?xml version="1.0" encoding="UTF-8"?>
  <entry>
    <id>urn:uuid:5f93c4a9-2ebd-ddd0-d13d-1edc29d511a6</id>
    <title>Le Mali Pour Les Nuls</title>
    <updated>2022-11-10T00:00:00Z</updated>
    <summary>Des vidéos pour découvrir le Mali</summary>
    <language>fra</language>
    <name>mali-pour-les-nuls_fr_all</name>
    <flavour></flavour>
    <category></category>
    <tags>youtube;_videos:yes;_ftindex:no;_pictures:yes;_details:yes</tags>
    <articleCount>10</articleCount>
    <mediaCount>26</mediaCount>
    <link rel="http://opds-spec.org/image/thumbnail"
          href="/catalog/v2/illustration/5f93c4a9-2ebd-ddd0-d13d-1edc29d511a6/?size=48"
          type="image/png;width=48;height=48;scale=1"/>
    <link type="text/html" href="/content/mali-pour-les-nuls_fr_all_2022-11" />
    <author>
      <name>Youtube Channel “Le Mali Pour Les Nuls”</name>
    </author>
    <publisher>
      <name>Kiwix</name>
    </publisher>
    <dc:issued>2022-11-10T00:00:00Z</dc:issued>
    <link rel="http://opds-spec.org/acquisition/open-access" type="application/x-zim" href="https://download.kiwix.org/zim/videos/mali-pour-les-nuls_fr_all_2022-11.zim.meta4" length="139335680" />
  </entry>

The bookName is present inside the link to the reader

<link type="text/html" href="/content/mali-pour-les-nuls_fr_all_2022-11" />

One could parse the link to extract the identifier but if it's a first-class input for our endpoints, maybe it deserves its own property?

It would prevent breaking access to it when changing our reader URLs but on the other end, despite its use, we don't want to consider it an identifier… just a pretty-URL-part

Popolechien commented 11 months ago

Having it appear on hover could be a good idea.

mgautierfr commented 11 months ago

bookName is a human-friendly book identifier in the internal catalog. It is built from the source ZIM filename, essentially normalizing it. On our public catalog, those mostly matches the Name metadata of the ZIM because that's our convention but it doesn't have to be.

Given it's an identifier, I wonder if we should expose it in the OPDS API.

I always was a bit disturbed by the bookName. (Without finding a solution for that).

The thing is that bookName is not a identifier. It is based on the filename and since a long time, we are stating that we must not:

However, it is a nice way to display in the url (the only place where it is used) a nice, human friendly name. So it is, at best, a local/temporary identifier only.

It would prevent breaking access to it when changing our reader URLs but on the other end, despite its use, we don't want to consider it an identifier… just a pretty-URL-part

I think we agree on this.

Following https://github.com/kiwix/kiwix-tools/pull/586, it is clear that the bookName is an important information for Kiwix-serve users/integrators.

Can you be a bit clearer why it is important ? How it would be used by users/integrators ? I expect them to click on the open button in the library. Or just copy/paste the url.

On raw/ endpoint which indeed could be used to check things, I would prefer change the endpoint to use uuid instead of bookName. For other endpoints, if it would have be only about me, I would use uuid too. But users may be too confused and I'm not sure it worth it.

rgaudin commented 11 months ago

This ticket is one year old 🎂 🎉

If you don't want to rely on the bundled reader, your still need the ZIM Name to construct a URL to the /raw endpoint in order to retrieve the content while using your own chrome. That URL ID not being exposed, it needs to be parsed/extracted from the text/html URL. That's fragile.

Note that since this ticket was created we changed the behavior of the /content/ prefix endpoint (the one returned for text/html by the API) to return the raw content (it was the reader version).

So that very example is not valid anymore but since it's the identifier part of the URL, that's a non-API endpoint that users should be able to construct: to build a different homepage but link to the reader URL for instance.

What I think:

kelson42 commented 11 months ago

IMHO It's time to implement it based on ZIM metadata.