kiwix / kiwix-apple

Kiwix for iOS & macOS
https://apple.kiwix.org
GNU Lesser General Public License v3.0
506 stars 69 forks source link

Supporting entries ending in `/` #793

Closed rgaudin closed 5 months ago

rgaudin commented 5 months ago

There is nothing in the ZIM spec that prevents using entries ending in / and thus we ended up in #779 with a ZIM (quartierschinois) that has a /-ending main page.

Nothing prevents URLs to end in slash neither.

One popular convention though is that the path is composed of segments separated by / and the last part is the document. As such, a /-ending path can be considered equivalent to the same path without that trailing slash. This convention is still implemented in some web servers, browsers, and libraries.

A swift URL has a path property that we use all over the code base to extract the ZIM entry path.

In the swift version we use, URL.path removes that trailing slash, breaking lookup of ZIM entries with /-ending path.

In later version of swift (macOS 13+), Apple added an URL.path() that keeps the trailing slash and works as intended with those entries.

What should we do?

  1. Consider /-ending ZIM entries incorrect and fix ZIM files to not use them?
  2. Fix by using URL.path() and bump our support from macOS12+ to macOS13+ and iOS15+ to iOS16+ ?
  3. Fix for all versions by normalizing path ourselves (URL.hasDirectoryPath tells us if URL ends in /) ?

I'm in favor of the latter, until we drop those versions support at the end of the year.

kelson42 commented 5 months ago

Fix 1 if possible in good conditions, not because this is wrong, but because this can be a source of trouble without bringing clear advantage.

AND

Fix 2 because Kiwix should work properly anyway and macOS15 will be released within a few month (so we can just bump the version of Swift... but maybe better in 3.5.0.

benoit74 commented 5 months ago

Fix 1 is not easy in good conditions in warc2zim. It would means removing the trailing / in all entries which have one (quite possible) but it means we have a risk of conflict, probably rare but mostly impossible to solve. It is perfectly legit for a website to have two different resources at two different URLs differing only by a trailing / (e.g. www.example.com/myresource and www.example.com/myresource/).

Since ZIM entries are just a "string", mostly mapped to a URL, and URLs ending with a slash are legit, I don't get why we would impose more constraints that necessary on ZIM entries.

benoit74 commented 5 months ago

Plus removing a trailing slash in the path of a ZIM entry corresponding to an HTML document means that all relative URLs in the document have to be adapted because we are not anymore in a "folder" but an a "file".

kelson42 commented 5 months ago

@BPerlakiH The we have the point 2 left!

benoit74 commented 5 months ago

Why not point 3 for 3.4, and point 2 for 3.5? (I mean, the code of point 3 seems a bit custom but rather plain simple to write)

rgaudin commented 5 months ago

Why not point 3 for 3.4, and point 2 for 3.5? (I mean, the code of point 3 seems a bit custom but rather plain simple to write)

šŸ˜‡

kelson42 commented 5 months ago

Why not point 3 for 3.4, and point 2 for 3.5? (I mean, the code of point 3 seems a bit custom but rather plain simple to write)

šŸ˜‡

Sounds a bit like a hack to me, which will be later rolled back. this is why I prefer #2 straight.