Closed kelson42 closed 1 year ago
This is a general regression (either on kiwix-serve or warc2zim) but none of the Zimit/SW based ZIM files seem currently readable anymore, see for example https://dev.library.kiwix.org/viewer#lowtechmagazine.com_en_all_2023-01/A%2Findex.html. TOP PRIO.
TOP PRIO.
Will go for it ASAP
The regression has probably been introduced with one of the latest PR related to URL encoding...
Yes, the culprit is kiwix/libkiwix#866
Like we can see, it looks like the
/
caractère is URL encoded but it should not.
This side-effect of kiwix/libkiwix#866 was known (it was even reflected in the unit-tests, see https://github.com/kiwix/libkiwix/pull/866/files#diff-ea8f381a9e851235fd9cb48c7263b697cb272a3e057464bf3069c343a9f95a30L1205-R1216) but its consequences were not understood and tested well enough. I didn't realize that URI-encoding the /
symbol in the path component can break a web-page because computation of relative URLs is affected.
I didn't realize that URI-encoding the
/
symbol in the path component can break a web-page because computation of relative URLs is affected.
That's another example of sloppiness on my side, as the excellent analysis in the description of kiwix/libkiwix#775 explicitly warns against URI-encoding the path separator (/
) character:
We must not url encode
/
as it is interpreted by the browser to build absolute path from relative path (css style, images,...). By encoding/
(A%2FFoo
instead ofA/Foo
) we can get the entry from kiwix-serve but the styling is broken (relative linkimg.png
will beimg.png
instead ofA/img.png
)
It seems that latest dev version of kiwix-serve/libkiwix is not able to load the Zimit created main page anymore because of an URL encoding problem.
Here is what I get:
But if I do the same with kiwix-serve 3.4.0, I get (and it's fine):
This has been done with the zimfile https://mirror.download.kiwix.org/zim/.hidden/dev/mes-quartiers-chinois_fr_all_2023-01.zim which has been created today with the HEAD version of warc2zim and zimit (new webac player).
Like we can see, it looks like the
/
caractère is URL encoded but it should not. The regression has probably been introduced with one of the latest PR related to URL encoding... but this is unclear what is exactly the root cause of the problem (maybe the value in wrongly encoded in the ZIM already?!)