No URL to access directly a content

kelson42 commented 1 year ago

There is no way to access directly a content. A limitation which has a few direct consequences like:

Impossibility to share a link to a content
No proper suggestion list
Borken fulltext search feature
Impossibilify to have a Kiwix "random" feature pointing to any content (only the welcome page is in the random list for the moment).

I would propose to improve Nautilus so we get a unique URL corresponding to this (visualize a content):

Once this is done, secure the all the points listed above are implemented properly

rgaudin commented 1 year ago

As much as I'm happy that you eventually changed your opinion on this, the ticket is a slightly incorrect:

There are unique URLs for each content:

most files are accessed directly and the homepage just lists a link, so the content URL is the file URL.
video and audio content have a dedicated popup with a player.
multi-files entries also have a popup
popups do have a unique URL such as /home.html#list-00011-0.
it is thus shareable
it's not visible in your screenshot because of the kiwix-serve URL bug

Currently, content files are not considered front-article and thus not returned by suggestion/search/random.

We can fix this in different ways:

set front-article (and IndexData for search) on content files
- all of them?
- only some content-types?
- specific files via a flag in collection.json?
Add per-content HTML entry
- would it be used by the regular UI or just returned via suggest/search/random?
- what would it contain ? Content of popup for those currently on popup obviously. But single files? The title/desc/author I suppose?

kelson42 commented 1 year ago

@rgaudin Thx for the clarification... but it makes the problem harder to fix! I don't feel I have a mature idea about how it should look like, but I have the intuition that each entry (both multifile and monofile) deserve a dedicated URL (a page, or an overlay... but not a direct access). The reason behind is that all the metadata can not (and should not) always be displayed on the welcome page. This URL/page would then be what is indexed for random/suggestions. That said we can still keep a way to open documents directly...

rgaudin commented 1 year ago

Then I suggest we discuss a general redesign of the tool because the whole nautilus UI is the home page. Those were the requirements… 4 years ago.

I think it deserves it and I think it's the appropriate time as well (because of the WebUI project). We just need to discuss and write down how we'd want it to work and maybe have it styles a bit.

Jaifroid commented 1 year ago

Let's please not forget "classic" (spec-compliant) access to the ZIM's contents via the listings in X/listing, particularly via X/listing/titleOrdered/v0 (this is the title pointer list) and X/listing/titleOrdered/v1. See #42. This would maximize accessibility, and I suspect it would also solve this issue. Also, it ought to be easy to populate these listings with #59.

rgaudin commented 1 year ago

Thank you @Jaifroid ; I've closed that other ticket so we don't split the same discussion on two tickets. As for #59, it has nothing to do with this. It's about how we list the content to include. No incidence on output.

You are rightfully differentiating titleOrderedv0 and titleOrderedv1. v0 is handled entirely by libzim and I believe it contains all entries so that's usable. v1 is controlled by the FRONT_ARTICLE that we set (or not) in the scraper and that's what I mentioned above. I didn't get an answer yet.

You seem to advocate (I might have misunderstood) putting every file in v1 so that it's easy to access from your end. If we did that, then v1 would have no value over v0. Currently, this listing also triggers the presence of the entry in the Random feature and the suggestion feature.

One question being the user experience of putting everything in the listing: most of nautilus is files that can be anything. When using the random button and you hit an entry that the browser can't handle (ZIP, epub, pdf in some contexts), then either it starts a download or present a dialog. If it's a video file, you may get a broken video UI because the codec is not supported.

Those are the considerations that I believe staled the discussion.

Jaifroid commented 1 year ago

I originally opened #42 in response to v0 Nautilus ZIMs not having anything other than home in the old title pointer list's A/ namespace, but it wasn't expressed very precisely. I was trying to update it for v1 ZIMs yesterday. These still have the same issue in a different form. Although you already know this, I'll put it here for the sake of precision in this issue, having checked the openZIM spec:

titleOrdered/v1 should be a list of pointers to the subset of dirEntries that we used to call "articles", but which, at least for Nautilus, it would be better to call "loadable resources". The corresponding dirEntries are supposed to have a title field, but where this is the same as the url, it can be empty and the backend will populate title from the url field.
titleOrdered/v0 is currently the same as titlePointerPos as defined in the ZIM header. It is kept for backwards compatibility. It contains listings for every entry in the ZIM in all namespaces (including M/and X/), ordered by <namespace>/<title-or-url>. We shouldn't touch this.

The important thing, IMHO, is that the dirEntries that are considered loadable resources (which might be the same as what you call FRONT_ARTICLEs), should be discoverable from titleOrdered/v1 so they can be presented to the user as part of a title search or as a list of titles when the dynamic UI can't be shown.

Regarding the useability issues you raise, I think the random button should just access the content in the same way as if a user had clicked on an entry on the home page if possible -- yes, it might not be possible without further coding for videos given that their entries open as a JS application in Nautilus ZIMs, but clients that can't run JS from the ZIM would prefer to be able to download the content than put it in a player that won't run. An alternative is for the reader software to show a dialogue box if random has served non-HTML content. It could simply ask "Do you want to download this file"? The user then has a chance to cancel. But this is a matter for the reader software.

rgaudin commented 1 year ago

I agree with all that ; hence my question above.

Just to be clear, the only leverage we have at scraper level is to set the FRONT_ARTICLE hint on an entry. Then the libzim creator decides what to do with it.

Currently, setting this triggers it to appear in Random and Suggestion so yeah, we have to make decisions for reasons that are out of bounds: anticipate how the reader will react and decide accordingly.

Semantically, all nautilus files could be considered FRONT_ARTICLE though.

kelson42 commented 1 month ago

It seems the solution approach chosen (from #73) is:

we'll have an HTML ZIM entry for each entry (fixes #54 with clear URL, suggestion and full text search)
- with the entry's details
- download link
- preview of content if possible (PDF, ePUB, video, audio, Images)
… which indeed would solve all the problems reported at first :)

openzim / nautilus

No URL to access directly a content #54