Closed nijazm closed 1 year ago
I would wait that we support latest libzim in node-libzim and mwoffliner before investigating this. Strongly suspect this has been somehow fixed already in the libzim. Clearly depends on https://github.com/openzim/mwoffliner/issues/1576
This issue is present (though in a slightly different way) with the new iframe-based viewer too - the fulltext search URL is http://localhost:8181/search?content=wikipedia_en_all_maxi_2021-12&pattern=&
where the ampersand symbol in pattern=&
is not URL encoded.
@veloman-yunkan OK, so at least we can fix that one.
BTW, the issue described in my previous comment is under Firefox 107.0. Debugging shows strange/counter-intuitive things happening, like the browser implicitly converting/decoding URLs ~during assignment to innerHTML
attribute of DOM elements~ (this actually turns out to be an inherent property of the href
attribute; see below comments). I am not sure that web-browsers based on a different web-engine have the same behaviour, which may explain the issue as observed by OP.
@nijazm What is your browser?
So it rather turned out to be automatic decoding of any URL-encoded characters in the value of the href
attribute of the <a>
HTML element.
Proof on a minimal example:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<a href="javascript:alert('ABCD%26EFGH')">Click me!</a>
</body>
</html>
When the link is clicked, the message box displays "ABCD&EFGH" (instead of "ABCD%26EFGH").
A more convincing version:
<!DOCTYPE html>
<html>
<head>
<script>
function foo() { alert('ABCD%26EFGH'); }
</script>
</head>
<body>
<a href="javascript:alert('ABCD%26EFGH')">Click me!</a>
<a href="javascript:foo()">Click me too!</a>
</body>
</html>
The first hyperlink containing inline javascript in the href
attribute displays URL-decoded text. The second hyperlink display the intended text as is.
A somewhat related question on stackoverflow: https://stackoverflow.com/questions/33721510/why-use-url-encoding-instead-of-html-encoding-for-the-href-attribute
Same issues occur on web browsers last versions of Chrome and Firefox, on Windows 11.
@nijazm This should now be fixed on master but it looks like you are using the previous release of kiwix-serve
. Is that correct? What is the output of kiwix-serve --version
on your side?
To be check with latest nightly https://download.kiwix.org/nightly/
I just tested today's version of kiwix desktop and kiwix tools on Windows 11.Now just shows fulltext search autocomplete result for & symbol and when I click on it app it says No results were found for "&"
. Both for kiwix desktop and kiwix serve (web browsers). In search box it shows containing '&'
. When I copy url I found in network tab of Inspect, meaning when I open this one: http://localhost:8181/suggest?content=wikipedia_en_all_maxi_2021-12&term=%26
this json is shown:
[
{
"value" : "& ",
"label" : "containing '&'...",
"kind" : "pattern"
}
]
When I fix URL by adding & instead of pecent code so it is this: http://localhost:8181/suggest?content=wikipedia_en_all_maxi_2021-12&term=&
then this is json response:
[
{
"value" : " ",
"label" : "containing ''...",
"kind" : "pattern"
}
]
Now that's a different problem. Most likely, the ampersand symbol is treated as punctuation and is simply discarded during the creation of the title index as well as when running suggestion search on it.
Ideally, while building the title index we should handle article names consisting of a single symbol or word in a special way, letting those terms go into the title index as is despite any rules that drop punctuation and stopwords. Also we will have to enhance the suggestion search so that it accounts for such an addition to the title index.
@kelson42 @mgautierfr What do you think? Is this issue worth the effort required to fix it?
@veloman-yunkan I'm slightly lost. I would really appreciate a new ticket with a clear reproductuon case.
When I enter in my web browser in english wikipedia zim started by kiwix-serve symbol
&
in search and click on autocomplete result, automatically this appears in search bar&
and it goes to fulltext search, so url becomes something like this:http://localhost:8181/search?content=wikipedia_en_all_maxi_2021-12&pattern=%26amp%3B
However when I directly enter in url:
http://localhost:8181/wikipedia_en_all_maxi_2021-12/A/&
it redirects properly tohttp://localhost:8181/wikipedia_en_all_maxi_2021-12/A/Ampersand
I have not tested other symbols, but that reminds me of similar errors encountered on some site, where entering symbol
"
also leads to errors, causing some additional characters to appear. Is it encoding or something, I don't know. Also similar error happens in kiwix-desktop app, meaning there is no autocomplete result for&
but only fulltext search.