kiwix / kiwix-tools

Command line Kiwix tools: kiwix-serve, kiwix-manage, ...
https://download.kiwix.org/release/kiwix-tools/
GNU General Public License v3.0
408 stars 79 forks source link

Regression with URL encoding: Zimit ZIM files are not readable anymore #594

Closed kelson42 closed 1 year ago

kelson42 commented 1 year ago

It seems that latest dev version of kiwix-serve/libkiwix is not able to load the Zimit created main page anymore because of an URL encoding problem.

Here is what I get: image

But if I do the same with kiwix-serve 3.4.0, I get (and it's fine): image

This has been done with the zimfile https://mirror.download.kiwix.org/zim/.hidden/dev/mes-quartiers-chinois_fr_all_2023-01.zim which has been created today with the HEAD version of warc2zim and zimit (new webac player).

Like we can see, it looks like the / caractère is URL encoded but it should not. The regression has probably been introduced with one of the latest PR related to URL encoding... but this is unclear what is exactly the root cause of the problem (maybe the value in wrongly encoded in the ZIM already?!)

kelson42 commented 1 year ago

This is a general regression (either on kiwix-serve or warc2zim) but none of the Zimit/SW based ZIM files seem currently readable anymore, see for example https://dev.library.kiwix.org/viewer#lowtechmagazine.com_en_all_2023-01/A%2Findex.html. TOP PRIO.

veloman-yunkan commented 1 year ago

TOP PRIO.

Will go for it ASAP

veloman-yunkan commented 1 year ago

The regression has probably been introduced with one of the latest PR related to URL encoding...

Yes, the culprit is kiwix/libkiwix#866

Like we can see, it looks like the / caractère is URL encoded but it should not.

This side-effect of kiwix/libkiwix#866 was known (it was even reflected in the unit-tests, see https://github.com/kiwix/libkiwix/pull/866/files#diff-ea8f381a9e851235fd9cb48c7263b697cb272a3e057464bf3069c343a9f95a30L1205-R1216) but its consequences were not understood and tested well enough. I didn't realize that URI-encoding the / symbol in the path component can break a web-page because computation of relative URLs is affected.

veloman-yunkan commented 1 year ago

I didn't realize that URI-encoding the / symbol in the path component can break a web-page because computation of relative URLs is affected.

That's another example of sloppiness on my side, as the excellent analysis in the description of kiwix/libkiwix#775 explicitly warns against URI-encoding the path separator (/) character:

We must not url encode / as it is interpreted by the browser to build absolute path from relative path (css style, images,...). By encoding / (A%2FFoo instead of A/Foo) we can get the entry from kiwix-serve but the styling is broken (relative link img.png will be img.png instead of A/img.png)