openzim / nautilus

Turns a collection of documents into a browsable ZIM file
GNU General Public License v3.0
19 stars 14 forks source link

List documents in the title pointer list and `X/listing/titleOrdered/v1`, and ideally give them meaningful titles #42

Closed Jaifroid closed 1 year ago

Jaifroid commented 1 year ago

Summary: This may be related to #41, but is a separate issue, I think. Currently PDFs in some Nautilus-based ZIM archives, for example in zimgit-post-disaster_en_2022-03.zim, do not have any entries in the ZIM's title pointer list. This makes it impossible (except with specialized tools) to search for these assets using the reader's own UI, and it means that users who cannot run JavaScript, are locked out of these ZIMs. However, it would be relatively easy to fix this and maintain backwards compatibility for such users.

Detail: PDFs are currently in the I/ namespace (see screenshot). This means that unless the reader is capable of searching the URL index (which is an advanced and not easily discoverable feature of only one reader currently), these assets are inaccessible except via the dynamic UI. Switching to Type 1 ZIMs (#41) could also be an opportunity to change this. As can be seen from screenshot below, the URL of these PDFs is not very informative. That may be a problem with the source material (file names). But if there is an easy way to access a more meaningful title without running JS in the client, that would be an improvement. Nevertheless, just referencing these assets in the Title Pointer List, even if the title cannot be made meaningful, would already be an improvement in accessibility (e.g. for users of Internet Explorer, or Firefox OS, or Windows Mobile, or old versions of Firefox such as IceCat).

Currently the title pointer list only contains one entry for these ZIMs: the main page (see second screenshot). Search via the reader's own UI also yields no results in Kiwix Desktop.

image

image
stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

Jaifroid commented 1 year ago

This is still an issue in C-namespace Nautilus-based ZIMs. See first screenshot below from maitre_lucas_calcul_decimaux_fr_2023-05.zim, which shows that only home is present in the ZIM's title index.

In my opinion, every item of content displayed in the proprietary UI, whether it be an image, a PDF or a video, should have a corresponding entry in X/listing/titleOrdered/v1. This would allow easy backward compatibility with clients that cannot run JS in the ZIM: they would not be able to display the UI of the landing page, but they would be able to access all the resources in the ZIM easily from title search. It seems unnecessary to exclude such users (I have some using Kiwix JS via the Firefox extension, and still others stuck on old versions of Edge that don't support Service Workers).

Fixing it should be relatively easy and would make the ZIM files comply with the specification. Titles can be constructed from the filenames. For this ZIM they are meaningful: see second screenshot below showing URL entries for PDFs and MP4s in C/files/.

image

image

rgaudin commented 1 year ago

I asked about this specifically in #54 and I think that's where this discussion will happen. I'm closing this as duplicate although it was the original ticket 🤷‍♂️