Open Jaifroid opened 9 months ago
https://github.com/kiwix/kiwix-js/commit/c528c94924086f61a8def8a352bed0bde78943d4 addresses the first part of this issue (adds test for 'zimit' in the scraper name).
@Jaifroid The recommended way of doing it is to rely on _sw ZIM tag. Zimit2 should not need anything special at reader level AFAIK. @benoit74 Wonder this not explicit in the documentation of warc2zim.
Thanks, @kelson42 I agree, I just can't use that method yet because all the zimit2 ZIMs produced so far have '_sw:yes'. Until that's fixed as requested by rgaudin, I have to use the current method.
There is a specific requirement in the reader to detect links and PDFs that cannot be opened in the webview or iframe due to sandboxing / CSP. Kiwix Serve has already been patched via libkiwix, and other readers that use libkiwix will have the patch. The issue is that Wombat aggressively rewrites such links, so they can't be detected without either temporarily disabling Wombat or using other workarounds. I've patched both KJS readers.
Both changes have been done:
warc2zim 2.0.0-dev2 + zimit 2.0.0-dev1 + Browsertrix crawler 0.12.4
_ftindex:yes;_category:other;lowtech
Not all tests ZIMs have been already rebuilt with this latest code change, but at least you have few to test.
@benoit74 Excellent, thanks!
As suggested here, we could look for 'warc2zim' AND 'zimit' strings in Scraper metadata (we currently only look for 'warc2zim', but it's not currently guaranteed to be stable), and if '_sw:yes' is not in tags, then it's zimit2. If it's there, then there is a Service Worker, meaning it's zimit classic.
We currently rely on finding 'warc-headers' in the declared MIME type. But it's possible (if currently unlikely) that such headers could be reintroduced if they are needed in future versions of zimit2, so it would be good to have other options as outlined above.