openzim / kolibri

Convert a Kolibri channel in ZIM file(s)
GNU General Public License v3.0
8 stars 12 forks source link

URLs should be meaningful #31

Closed kelson42 closed 6 months ago

kelson42 commented 3 years ago

It is currently a cryptic string (a hash probably). This is not user friendly. It should be based on the page title (slug?) and collision risk should be managed.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

benoit74 commented 11 months ago

Current proposition is #68 consists in computing a slug based on Node title. E.g for a pdf page titled "Caperucita Roja", the last part of page URL will be "/caperucita-roja"

Question for @kelson42 @Popolechien: what should we do in case of duplicated title?

While probably not recommended, it is totally possible in Kolibri, especially in different folders. Note that folders are not present in the URLs to avoid too long URLs / simplify code. But even with folders in the URL, we could have two resources with same name in Kolibri.

Current proposed code is appending a suffix once a conflict is detected:

The inconvenient of this approach is that page ordering is mostly random, and linked to the presence of previous pages. This leads to the fact that URLs are not permanent across ZIM versions, e.g. if between N and N+1 the second paged titled "Caperucita Roja" above is deleted, the third one above will become the second one and will URL will change from "/caperucita-roja-2" to "/caperucita-roja-1"

The advantage is of course that code is simple.

Is it acceptable for you?

Another approach could be to add to all URLs a prefix or suffix based on Kolibri node ID (which is random and consistent across ZIM versions, so we can expect that if we take sufficient characters, two pages with same title will have different node IDs). For instance we could decide to add first 4 chars as a suffix to all URLs (e.g. "/caperucita-roja-2e7d") or prefix (e.g. "/2e7d-caperucita-roja"). I'm however really not sure that these node IDs are persistent across Kolibri channel updates, especially when sushi-chef is used.

My recommendation would be to use current approach of course ^^ Because risk of title conflict is low, probability that someone will store a bookmark to a resource is limited, and probability that a resource URL change is even lower, so all that combined seems to be very reasonable. Not speaking about the fact that the other alternative might not be much more robust.

rgaudin commented 11 months ago

Please look into whether there is any identifying data that persists accross versions of a kolibri channel. If not, there's nothing to discuss.

That aside, bookmarking (not talking about version-persistent ones here) is a popular feature, especially in Lab environments where teachers keep and share them. Particularly useful for educational resources and especially those with deep trees like Khan Academy. Just imagine how painful it would be to conduct a session without direct links.

Popolechien commented 11 months ago

I honestly have no opinion, but @rgaudin seeems to be making a good point.

benoit74 commented 11 months ago

In the DB, we do not have any identifying data aside from the node ID.

But as usual, I was wrong. Node IDs are generated by ricecooker as UUID version 5, i.e. they are based on a namespace + a name, which are hence constant across versions of a same channel.

benoit74 commented 11 months ago

Up @kelson42, we need your point of view (or simply say that you don't care), it is an important design decision, we can discuss it live if needed.

benoit74 commented 11 months ago

@kelson42, in https://github.com/openzim/kolibri/pull/68, you said that

EPUB filename should be meaningful. For example qaxalefi-obboleewwan-isaa.epub at https://dev.library.kiwix.org/viewer#africanstorybook.org_mul_all_newui_2023-10/static/qaxalefi-obboleewwan-isaa

And I said it was not yet implemented for EPUB, but I was wrong qaxalefi-obboleewwan-isaa is already a meaningful name based on book title. Why is it not meaningful for you? What do you propose to make it more meaningful ?

kelson42 commented 11 months ago

Adding nodeid (only 4 letters) only if needed and at the end seems to be the correct approach to me.

rgaudin commented 11 months ago

Adding nodeid (only 4 letters) only if needed and at the end seems to be the correct approach to me.

It's not. If you only add it “if needed” then it's not permalinks anymore: if next version introduces a duplicate, one of them (and you don't control which) will get it

kelson42 commented 11 months ago

Adding nodeid (only 4 letters) only if needed and at the end seems to be the correct approach to me.

It's not. If you only add it “if needed” then it's not permalinks anymore: if next version introduces a duplicate, one of them (and you don't control which) will get it

Not sure if this is worth it... anyway I'm ok if this is always at the end.