Closed Jaifroid closed 5 months ago
They definitely have difficulties running a website 😉 https://github.com/openzim/zimit/issues/171
@Jaifroid starting URL is already https https://developer.mozilla.org
Yes, this is strange. So, the C/A/index.html
of this ZIM (referenced by W/mainPage
), is below, and as you can see it contains a script which sets window.mainUrl
to a secure page https://developer.mozilla.org/en-US/
, which is what the reader attempts to load. But the corresponding page, C/A/developer.mozilla.org/en-US/
is not in the ZIM. Instead, there is a header C/H/developer.mozilla.org/en-US/
which contains the above 301: moved permanently WARC record which is a record of a visit to an insecure page (http:
), being redirected by the server to a secure page and port.
Would allowing https://developer.mozilla.org:443/en-US/
(with the port) as in scope potentially fix the issue? I realize it would be a workaround...
<html>
<head>
<meta charset="utf-8"/>
<script>
window.mainUrl = "https://developer.mozilla.org/en-US/";
</script>
<script src="./load.js"></script>
</head>
<body>
<div id="loading" style="width: 100%; text-align: center">
<svg width="38" height="38" viewBox="0 0 38 38" xmlns="http://www.w3.org/2000/svg" stroke="#fff">
<g fill="none" fill-rule="evenodd">
<g transform="translate(1 1)" stroke-width="2">
<path d="M36 18c0-9.94-8.06-18-18-18" stroke="gray">
<animateTransform
attributeName="transform"
type="rotate"
from="0 18 18"
to="360 18 18"
dur="1s"
repeatCount="indefinite"/>
</path>
</g>
</g>
</svg>
</div>
<div id="error"></div>
</body>
</html>
Would allowing
https://developer.mozilla.org:443/en-US/
(with the port) as in scope potentially fix the issue? I realize it would be a workaround...
That's what I think but I need to test the regexp locally.
It's a satisfactory solution given the website is misbehaving using unusual methods.
@Jaifroid Is this ticket still valid with latest dev version? https://dev.library.kiwix.org/#lang=&q=Mdn
Ah no, this was fixed a long time ago... Both for Zimit1 and for Zimit2.
The last good scrape of this useful and popular site was in February (
developer.mozilla.org_en_all_2023-02.zim
). There is a more recent scrape in the dev download server from September (developer.mozilla.org_en_all_2023-09.zim
) but there is an error loading the landing page of this ZIM.I've looked into the reason for this, and it appears to be because the landing page is an insecure URL (
http://developer.mozilla.org/en-US/
, note thehttp:
protocol). This gets redirected via a 301 WARC header (C/H/developer.mozilla.org/en-US/
, see header content below) to a secure porthttps://developer.mozilla.org:443/en-US/
(note the:443
port). However, the corresponding page has not been scraped, no doubt because it is out of scope due to the port.The scrape could possibly be fixed by ensuring the starting URL is
https:
, though that may be a simplistic assumption.Pinging @rgaudin because I believe this ZIM is one of the Zimit test-case ZIMs.