kiwix / libkiwix

Common code base for all Kiwix ports
https://download.kiwix.org/release/libkiwix/
GNU General Public License v3.0
112 stars 55 forks source link

Incorrect missing illustration handling in --library mode #950

Closed rgaudin closed 1 year ago

rgaudin commented 1 year ago

There are (bad) reasons for a ZIM to lack an illustration:

Whether those are valid use cases is not important here. We've already stated elsewhere that despite not a wanted situation, it's one that can arise and that we support. Hence the Question Mark icon in libkiwix.

ZIM fcc_en_test_2023-04 in https://dev.library.kiwxi.org has no illustration.

❯ curl https://dev.library.kiwix.org/catalog/v2/entry/02d608c7-268e-0659-5dd6-028b7df79ee3
<?xml version="1.0" encoding="UTF-8"?>
  <entry>
    <id>urn:uuid:02d608c7-268e-0659-5dd6-028b7df79ee3</id>
    <title>Test ZIM</title>
    <updated>2023-03-13T00:00:00Z</updated>
    <summary>Created in python</summary>
    <language>eng</language>
    <name>fcc-test</name>
    <flavour></flavour>
    <category></category>
    <tags>_ftindex:no;_pictures:yes;_videos:yes;_details:yes</tags>
    <articleCount>117</articleCount>
    <mediaCount>1</mediaCount>
    <link rel="http://opds-spec.org/image/thumbnail"
          href="/catalog/v2/illustration/02d608c7-268e-0659-5dd6-028b7df79ee3/?size=48"
          type="image/png;width=48;height=48;scale=1"/>
    <link type="text/html" href="/content/fcc_en_test_2023-04" />
    <author>
      <name>python-libzim</name>
    </author>
    <publisher>
      <name>You</name>
    </publisher>
    <dc:issued>2023-03-13T00:00:00Z</dc:issued>
    <link rel="http://opds-spec.org/acquisition/open-access" type="application/x-zim" href="https://download.kiwix.org/zim/dev/fcc_en_test_2023-04.zim.meta4" length="380928" />
  </entry>

This tells me there is an illustration. It should not.

Next, when querying this illustration:

❯ curl -v https://dev.library.kiwix.org/catalog/v2/illustration/02d608c7-268e-0659-5dd6-028b7df79ee3/?size=48

< HTTP/2 200
< date: Thu, 25 May 2023 11:56:57 GMT
< content-type: image/png
< content-length: 0
< cache-control: max-age=0, must-revalidate
< etag: "1684987294697829208.20916/"
< access-control-allow-origin: *
< set-cookie: userlang=en;Path=/;Max-Age=31536000
< strict-transport-security: max-age=15724800; includeSubDomains
<

This tells me there is a resource (HTTP 200) but with a Content-Length: 0. If the resource is missing, we should get an HTTP 404 response.

One consequence of those issues is that the viewer, rightfully, considers there is an illustration and thus requests the browser to displays it in place of the missing illustration placeholder.

Those issues happen with the ZIM is served via a library.xml file. Here's the catalog in use on the server:

<?xml version="1.0" encoding="UTF-8" ?>
<library version="20110515">
<book
  id="02d608c7-268e-0659-5dd6-028b7df79ee3"
  size="372"
  url="https://download.kiwix.org/zim/dev/fcc_en_test_2023-04.zim.meta4"
  mediaCount="1"
  articleCount="117"
  title="Test ZIM"
  description="Created in python"
  language="eng"
  creator="python-libzim"
  publisher="You"
  name="fcc-test"
  tags="_ftindex:no;_pictures:yes;_videos:yes;_details:yes"
  date="2023-03-13"
  faviconMimeType="image/png"
  path="/Users/reg/Downloads/fcc_en_test_2023-04.zim"/>
</library>

As you can see, the favicon property is not set.

When served directly using the file, the behavior is different:

❯ curl http://localhost:9999/catalog/v2/entry/02d608c7-268e-0659-5dd6-028b7df79ee3
<?xml version="1.0" encoding="UTF-8"?>
  <entry>
    <id>urn:uuid:02d608c7-268e-0659-5dd6-028b7df79ee3</id>
    <title>Test ZIM</title>
    <updated>2023-03-13T00:00:00Z</updated>
    <summary>Created in python</summary>
    <language>eng</language>
    <name>fcc-test</name>
    <flavour></flavour>
    <category></category>
    <tags>_ftindex:no;_pictures:yes;_videos:yes;_details:yes</tags>
    <articleCount>117</articleCount>
    <mediaCount>1</mediaCount>
    <link type="text/html" href="/content/fcc_en_test_2023-04" />
    <author>
      <name>python-libzim</name>
    </author>
    <publisher>
      <name>You</name>
    </publisher>
    <dc:issued>2023-03-13T00:00:00Z</dc:issued>

  </entry>

No link to the illustration 👍 If I'm trying to force it, I get a proper 404

❯ curl -I http://localhost:9999/catalog/v2/illustration/02d608c7-268e-0659-5dd6-028b7df79ee3/?size=48
HTTP/1.1 404 Not Found
Date: Thu, 25 May 2023 12:08:13 GMT
Connection: close
Cache-Control: max-age=0, must-revalidate
Access-Control-Allow-Origin: *
Content-Type: text/html; charset=utf-8
Set-Cookie: userlang=en;Path=/;Max-Age=31536000
Content-Length: 376
kelson42 commented 1 year ago

Might be related to #754?

veloman-yunkan commented 1 year ago

Might be related to #754?

@kelson42 No

As you can see, the favicon property is not set.

@rgaudin However the faviconMimeType attribute is set which (for the current not so fool-proof implementation) was an indicator that favicon info is provided. In a sense, this bug is about insufficient foolproofness. I have fixed it in #961. A follow-up issue might be if favicon is set to an invalid value (e.g. favicon="%") :stuck_out_tongue_winking_eye: .

rgaudin commented 1 year ago

Ah ; interesting ; thanks for the fix!