Open Jaifroid opened 1 year ago
See https://github.com/openzim/mwoffliner/issues/1731 on why using a newer scrape of this ZIM is not possible for historical / corpus work (in sum, newer scrapes show the current, 2023, content for each article instead of the original content).
The reason it is failing in SW mode is because all of the hyperlinks in this legacy ZIM are given as root-relative absolutes, i.e. in the form /A/Some_page.hml
, or /I/Some_image.jpg
. In jQuery mode, our regular expression for matching ZIM links ignores any forward slash at the start of a ZIM link that it recognizes, and this works across the board. However, in ServiceWorker mode, we are scrupulous in respecting the coding in the ZIM, and so these links are interpreted as-is. This of course puts the resources outside the scope of the Service Worker, and they are not caught and processed by the Service Worker.
We need a safe way to recognize this situation and offer the user a possibility for reading such ZIMs. I can think of two ways:
I assume, as we still have jQuery mode, and this works well with such ZIMs, that 1. would be the most acceptable solution for now. But 2. might be necessary in a pure Service-Worker-mode future.
@Jaifroid This file is really old and does not realy respect the ZIM specifications anymore (because the absolute links). It should not be a problem if not supported. Where exactly have you found this ZIM to download? It should not be part of the Kiwix catalogue!
@kelson42 It is linked for historical record purposes (I presume) from https://en.wikipedia.org/wiki/Wikipedia:Version_0.8/downloads. I think a historian would praise your foresight in keeping these early archives, going back to Wikipedia 0.5 from 2007. I would think very carefully before removing access to them (and I hope they're backed up!).
Note that the current scrape of Wikiepedia 0.8 is not working as expected (I think), as I reported in https://github.com/openzim/mwoffliner/issues/1731. It is providing 2023 versions of the pages instead of the original content. That makes it all the more important that the original archives are kept (and made available) IMHO.
The ZIM in question is
wikipedia_en_wp1-0.8_orig_2010-12.zim
. While this is a legacy ZIM, it is available from download.kiwix.org (archive directory) as an (historical) corpus, and this ZIM is linked to from the online Wikipedia 0.8 home page, so maintaining the ability to read it would seem important, or, at the very least instructing the user on how to use and display this ZIM correctly in the reader.Currently it only displays properly in jQuery mode on both Firefox and Chromium, and in fact it is only possible to navigate in the ZIM at all (other than by searching for an article) in jQuery mode. Screenshot below shows typical display of a page in SW mode on left (all CSS broken, all images broken), and jQuery mode on the right (all images and CSS display correctly). This is in the Firefox extension. The only "problem" in jQuery mode is that the active content warning is displayed (which should be fixed).
In SW mode, clicking any link in an article shows "404 Not Found".
Kiwix Desktop displays content from this ZIM correctly, and navigation functions fine.