openzim / nautilus

Turns a collection of documents into a browsable ZIM file
GNU General Public License v3.0
19 stars 14 forks source link

Possible race condition in zimwriterfs.js included in nautilus-based ZIMs #40

Closed Jaifroid closed 1 year ago

Jaifroid commented 1 year ago

OK, this isn't caused by inline JavaScript, but by a race condition (I'm guessing) with the loading of the favicon, which is used too early by a test in zimwriterfs. In Chromium browsers running from localhost or from an extension, it appears the favicon is not loaded by the time the test is run. See screenshot (you can also see a dump I did of document.head.innerHTML, and you can see that the favicon is not in the document's source code yet).

image

This in turn causes nautilus.js to error out:

image

However, if I wait till the document is loaded fully before running the favicon test, I get a correct result in console.log:

document.getElementById("favicon").getAttribute("href").indexOf("/I/") !== -1;
true

Originally posted by @Jaifroid in https://github.com/openzim/nautilus/issues/34#issuecomment-1254512772

Jaifroid commented 1 year ago

Moving <script src="../-/zimwriterfs.js"></script> out of the header and to the bottom of document.body could possibly fix the race condition. If not, there are other techniques to delay its loading, e.g. adding the defer attribute, or use document.addEventListener ("load", FUNCTION);

Jaifroid commented 1 year ago

I did a quick-and-dirty test, and adding 'defer' to tha zimwriterfs.js script block doesn't work. I'll try to test moving it.

rgaudin commented 1 year ago

Might be simpler to just updated to new libzim which doesn't have namespace and thus should not require this hack at all.

Jaifroid commented 1 year ago

Might be simpler to just updated to new libzim which doesn't have namespace and thus should not require this hack at all.

Perhaps... I've tested moving it to the bottom of the body block, and that doesn't resolve the race condition either. In these contexts, it seems, the favicon takes quite a lot of time to load in Chromium browsers (after all, it does have to be extracted from the ZIM). Only other hacky solution would be to add some longish delay, or think of another way to detect we're in a ZIM. Your solution would be better, if we can do away with this dependency on the favicon completely.

rgaudin commented 1 year ago

Yes, we had a similar hack for youtube and ted scrapers (this is related to ogvjs for playing videos) and got rid of it when switching to no-ns libzim. There just wasn't a need for nautilus… yet

Jaifroid commented 1 year ago

Just for info, what works (by way of a very dirty patch) is to delete nautilus.js, init.js and zimwriterfs.js from the document.head, and then to re-attach them to the document head after a timeout of 1 second. Obviously, as it's a race condition, 1 second might not be enough on slow systems, and it's not a "solution". The soution would be to change to Type 1 ZIMs, as you suggest, @rgaudin . Nevertheless I've patched the Kiwix JS PWA on a temporary basis for backwards compatibility, until Type 1 ZIM support can be added here. I don't think this patch would be acceptable in the Kiwix JS browsr extension (for good reasons).

Jaifroid commented 1 year ago

I've finally determined that this bug is caused by interaction with a routine in the Kiwix JS Windows reader that rewrites some fields in stylesheets in order to prepare for extraction from the ZIM. This shouldn't run in Service Worker mode, but some part of this processing gets turned on to allow the user to extract a fully self-sufficient HTML version of an article if they wish. Favicons are included in this processing.

In short, it's a bug in the reader, exposed by the dependency of zimwriterfs.js on the favicon. I think this investigation was useful. I'll close this in favour of an issue to switch the scraper to Type 1 and remove dependency on favicon.