Closed rgaudin closed 1 year ago
Might be related to #604
I have no problem with firefox, chrome and chromium.
@Popolechien reported it ; I can reproduce on latest Chrome
Sorry the link is incorrect ; you have to use the viewer. I need to open a ticket about that as well
Indeed. Removing the sandbox attribute solves the problem. This is probably introduced by https://github.com/kiwix/libkiwix/pull/906 (issue https://github.com/kiwix/kiwix-tools/issues/604)
The code I give in that issue works around this specific problem with the pdf viewer in Chrome/Edge. It is not recognised by the browser as same origin, hence PDFs must be opened in a new window or tab. https://github.com/kiwix/kiwix-tools/issues/604#issuecomment-1470013026
That's quite a consequence in term of UI. If it's not possible to render it in the same iframe, maybe we should use an in-zim pdf.js but that's not as convenient
Personally I think this is a Chromium bug, because the in-browser PDF viewer should clearly be same-origin when loading a PDF from the same origin. Or maybe it's a feature because PDFs can have active content that can contact external servers (?).
I didn't find any other workaround in Kiwix JS than to make click on a PDF open a new window/tab. Having said that, the new window uses the same in-built PDF viewer, so it's not too big a deal. Epubs have to be downloaded anyway because no browser provides a custom viewer.
It's a balance between the security of the iframe (in terms of not leaking info out: a particular concern with Zimit archives) and (in)convenience... Otherwise, there's really nothing to stop a script in the iframe navigating elsewhere.
NB sandboxing the iframe doesn't stop a determined malicious attacker, but it stops accidental redirects, accidental (or well intentioned) attempts by scripts to break out of iframes, and accidental contacting of external sites for font files, images and (potentially) scripts.
See https://github.com/whatwg/html/issues/3958 which give some information.
PDF viewer is implemented as a plugin in chrome and it is deactivated in sandboxed iframe.
Please also add allow-downloads
. Some people may have a Firefox setup where "Settings > General > Files and Applications > Applications > Portable Document Format (PDF)" is set to "Save File" instead of "Open in Firefox". I heard there was a recent proposal to use ZIMs for serving .apk files, and this would be necessary to support their use case.
The sandbox attribute needs to stay to stop the Wiktionary zim from breaking. What's wrong with pdf.js? It would only make the experience more consistent across platforms.
I guess that a fix addressing #912 but not this ticket doesn't make much sense. On the other hand, I think that a fix to this issue should also automatically fix #912.
This is what we went for in Kiwix JS (for the sandbox attribute of iframe, or can be served as part of CSP response header):
allow-same-origin allow-scripts allow-modals allow-forms allow-popups allow-downloads
@veloman-yunkan Is this sandbox attribute actually in force at library.kiwix.org yet? Because if you go to https://library.kiwix.org/viewer#wiktionary_en_all_nopic_2023-02 , you clearly see that a top-level navigation occurs and the iframe is destroyed..., and if I inspect the iframe just before the offending script runs that breaks out of it, I see what's in the screenshot below.
Now, if the sandbox isn't implemented at library.kiwix.org, it means that the issues with the PDF viewer are not directly down to #906, unless I'm missing something (or my browser cache is VERY persistent, despite clearing it)...
(Though I do think #906 will block PDFs in Chrome, because I already had to patch that in Kiwix JS.) Confused 😖...
And, btw, that referrerpolicy
should probably be none
, rather than same-origin
.
@Jaifroid https://library.kiwix.org runs the latest release of kiwix-serve
. #906 has not been released yet. Its side effects are currently demonstrated at https://dev.library.kiwix.org (to which this ticket refers).
@veloman-yunkan Thank you for clearing that up!
If it's not possible to render it in the same iframe, maybe we should use an in-zim pdf.js but that's not as convenient
@rgaudin Another option is to embed pdf.js in kwix-serve
. However, I wonder what kind of inconvenience you referred to and whether it applies to the proposed solution too.
Personally, I'd say there's no real inconvenience in the simplest solution of opening the PDF in a new tab or window. In many ways, it's better UX, because the user can keep that tab to read later and carry on browsing in the iframe. It also parallels the experience with EPUBs, which download separately rather than opening in the iframe. But I get that people have different opinions about such things!
Bundling pdf.js inside a ZIM means displaying PDF ourselves, via pdf.js. It means creating a host webpage which won't feel exactly as the in-browser pdf.js. Also, for those with other PDF reader configured, it means rendering in PDF.js then potentially downloading (pdf.js allows this) to trigger the default PDF reader. It's far from being as comfortable as the normal situation.
As for including pdf.js in kiwix-serve, I don't see how that would help… Do you want to add a kiwix-serve specific API to use it? Do you want to intercept application/pdf
responses and render them as text/html with pdf.js ?
Then you have the SW discussion all over again (reader-ZIM dependency that's not part of the spec)
But I get that people have different opinions about such things!
Exactly.
We can look at this from both angles though:
The trick is that kiwix-serve is both a first-class reader and a Web party so boundaries are blurred.
I don't care much about the outcome of the tab thing but I am worried that we may be starting to drop support for regular web features. @kelson42 @Popolechien what do you think?
For anyone coming to this longish thread at this point, the executive summary is as follows:
There is a vulnerability in Kiwix Serve and Kiwix JS related to the use of iframes to display articles: scripts in these articles can accidentally (or on purpose) navigate to remote sites and leak user data to remote servers, or they can break out of the iframe and destroy the app's controls;
No. The vulnerability is that we display a "falsely offline" version of website which can still phone home and leak user data to remote servers. This is how web works and except by blocking all connections to server, it can always append.
Before iframe, kiwix-serve was simply displaying the content in the top frame. "falsely offline" websites were obviously able to phone home. And as we were inserting the top bar in the content of the page, it was even simpler for the website to break out the app's controls.
The iframe is the occasion to add some kind of protection against accidentally "falsely offline" website when it was impossible to do before. The change to iframe is not the source of the leak.
However, we have exchange the website css breaking the css of the app controls against links with target=_top
breaking the app controls.
There is no more user data leak than before.
You can use the "no viewier/iframe" with the /content/zim_name/path/to/article
to see that website can phone home all the time.
Blocking this requests was never a goal of kiwix-serve. It may change, but if it became a purpose, the solution is probably more in service worker (to control all connection going out of the displayed website) than in iframe.
This is not the behaviour the creator of the ZIM may have intended, so there is a balance to be struck between security and respect for the intentions of the ZIM creator / trusted web resource.
There is a balance between adding a new security level and not inferring in the website content (in chrome browser)
Blocking any link with target=_top
(with sandboxed iframe) may not be a good idea neither.
A website may be internally composed of several iframes. On iframe may display a menu with links changing the top iframe content. Allowing target=_top
link breaks the (viewer) ui, but at least the website is usable.
And this is difficulty patchable as other player than kiwix-serve
/kiwix-js
don't use iframe and so target=_top
is totally valid.
It would be to the viewer itself to dynamically patch (or intercept) links with target=_top
and replace them with target=framename
.
However, if the target=_top
is not really needed, we can consider that as a bug in the scrappers and they should remove it.
@mgautierfr in https://github.com/kiwix/kiwix-tools/issues/604#issuecomment-1460035994 I suggested that instead of using the sandbox attribute on the iframe, Kiwix Serve can serve all content with a CSP sandbox response header. I agree with you that this would be conceptually more elegant. However, we would still have this issue with PDF content not being rendered in Chromium browsers in the iframe, and also the issue with external links being blocked. Like in Kiwix Android, these will still have to be intercepted and opened in an external window. Adding the sandbox attribute in the response header or adding it in the iframe are exactly equivalent in terms of functionality (I've tested this in the PWA).
EDIT: It might be possible to serve only HTML with a CSP sandbox response header, but not serve PDFs with this header. It would need testing as to whether this would allow them to render in the iframe.
No. The vulnerability
There are multiple versions of the bug. There is more information on Slack.
website css breaking the css of the app controls
Shadow DOM was designed for this problem.
a balance between adding a new security level and not inferring in the website content (in chrome browser)
And between fixing the website content (in Wiktionary)
A website may be internally composed of several iframes
Have we seen how archive.org does it? archive.org edits the HTML of all webpages to insert its header with the controls. It rewrites links, and this is to ensure multimedia paths work and external links are intercepted. With that strategy, we can either use the CSP HTTP header like you suggested, or we can inject a <meta>
tag. archive.org does not edit the data of multimedia such as images, so PDFs should be rendered properly if we switch to that strategy.
@danielzgtg I used to use only a <meta>
tag CSP in Kiwix JS PWA, but unfortunately you can't use the sandbox attribute in <meta http-equiv...>
. See Content-Security-Policy/sandbox:
This directive is not supported in the
<meta>
element or by the Content-Security-policy-Report-Only header field.
And without sandbox
we can't stop top-level navigation. The other disadvantage of CSPs via <meta>
is that you have to alter the HTML to inject the tag, whereas adding a CSP response header or iframe sandbox does not require altering the HTML in any way.
Rewriting links isn't necessary if you intercept the user's click on the iframe and inspect the target.
There are multiple versions of the bug. There is more information on Slack.
Please put it somewhere on github (here or on https://github.com/kiwix/kiwix-tools/issues/604) has we need trace of the discussion. Slack is not public.
Shadow DOM was designed for this problem.
Give me more
Have we seen how archive.org does it? archive.org edits the HTML of all webpages to insert its header with the controls. It rewrites links, and this is to ensure multimedia paths work and external links are intercepted. With that strategy, we can either use the CSP HTTP header like you suggested, or we can inject a tag. archive.org does not edit the data of multimedia such as images, so PDFs should be rendered properly if we switch to that strategy.
We were modifying the content in kiwix-serve but we changed that especially to not modifying it. The idea is to interfere the less possible with the content in the zim file as this content should be "just working". (If not, fix the scrapper).
There are multiple versions of the bug
Please put it somewhere on github
The 3 versions I had in mind are:
window.top.location
. CLOSED FIXED by adding <iframe sandbox="[everything except allow-top-navigation]">
.<meta>
wasn't being inserted if <head>
is missing (possibly also affects svg or xslt if someone goes and tests) due to the regex not considering that. CLOSED FIXED by repeating the <meta>
in the <head>
outside the `window.top
. CLOSED WONTFIX. It would be too hard, and the workaround is https://github.com/kiwix/kiwix-js/issues/974Shadow DOM was designed for this problem
Give me more
There is a good shorter-than-1-page demonstration at https://javascript.info/shadow-dom#encapsulation . We could do that for the controls instead of the <iframe>
and still keep the our Kiwix style intact and isolated from the page. But that would require a minor modification to the page, which you said you don't want.
There is a good shorter-than-1-page demonstration at javascript.info/shadow-dom#encapsulation .
I new world opens to me. Thanks for the link. It would have been nice to know about that few years ago
On Shadow DOM, remember we need to support a wide range of browsers on possibly quite old devices, given where Kiwix Serve needs to be deployed, and although shadow DOM is now quite widely supported, that wasn't the case a few years ago, and it's still the case that support is patchy on mobile device browsers, for example. For maximum compatibility, I think we're still stuck with iframes. See https://caniuse.com/shadowdomv1 .
wide range of browsers on possibly quite old devices, [...] on mobile device browsers
Old mobile devices aren't a problem. Chrome is evergreen/autoupdatingo for Android 5+. My Firefox Nightly is Android 5+. It's evergreen too, and the only non-evergreen Firefox is Firefox ESR, but by its version number, it supports Shadow DOM too. Opera Mini is not a concern because it requires internet access to Opera servers. The only concern might be Android Browser / Android Webview. That is tied to the OS and used when the phone does not have Google Play Services.
we need to support a wide range of browsers
IE11 is the only problematic browser left. If that's the case we can include a shadow DOM polyfill. A quick Google search led me to https://github.com/webcomponents/polyfills/tree/master/packages/shadydom and https://github.com/tuespetre/shadow-dom .
@Jaifroid Can you commit a .browserslistrc
file somewhere? I tried looking for it in kiwix-js, kiwix-js-windows, libkiwix, and kiwix-tools. kiwix-js only describes it in informal text and the versions written there seem to be higher than what you mean by "quite old devices". Both developers and automated tools can understand the https://github.com/browserslist/browserslist standard. Having that file would bring everyone onto the same page about what browsers are need to be supported.
Old mobile devices aren't a problem. Chrome is evergreen/autoupdatingo for Android 5+
This assumes Internet access which lack is precisely why we deploy Kiwix
Coming back to the main purpose of this issue, I think I have a solution that allows us to sandbox ZIM articles and does not interfere with loading PDFs into the iframe. It is the one I outlined above, i.e. that we serve articles from the ZIM with a CSP sandbox response header, but not PDFs, and we remove the sandbox attribute from the iframe. I have a proof-of-concept here using the Kiwix JS browser extension's test PWA:
Please ensure this is showing version 3.7.3 (if it's not, wait around 10 seconds for a notification that 3.7.3 is ready to load). This version correctly blocks the attempt at top-level navigation by the February English Wiktionary ZIM, but allows PDFs to load in the iframe. These are the settings used (some incidental things are specific to Kiwix JS):
// Set Content-Security-Policy to sandbox the content (prevent XSS attacks from malicious ZIMs)
headers.set('Content-Security-Policy', "default-src 'self' data: blob: about: chrome-extension: https://moz-extension.kiwix.org
https://kiwix.github.io 'unsafe-inline' 'unsafe-eval'; sandbox allow-scripts allow-same-origin allow-modals allow-popups
allow-forms allow-downloads;");
headers.set('Referrer-Policy', 'no-referrer');
<iframe id="articleContent" class="articleIFrame" src="article.html" referrerpolicy="no-referrer"></iframe>
<meta http-equiv="Content-Security-Policy" content="default-src 'self' data: https://download.kiwix.org
https://master.download.kiwix.org https://moz-extension.kiwix.org https://kiwix.github.io 'unsafe-inline' 'unsafe-eval';
frame-src 'self' moz-extension: chrome-extension:; object-src 'none';">
I hope this is helpful in resolving this long-winded issue and also kiwix-tools#604 / #906.
Thank you @Jaifroid for trying to wrap-up. I have a hard time to get a ckear picture myself. But is seems to me to be the best path. Anybody has a better alternative in mind?
Since he did #906, maybe @veloman-yunkan could kindly try a dev implementation of the above for Kiwix Serve, removing the Kiwix JS specific code, and setting the CSP/sandbox response header in the server code (not via a Service Worker). My test implementation (done with Kiwix JS) doesn't test the case of Zimit archives, because the Kiwix JS browser extension doesn't run those yet, so it would be important to test those thoroughly. The question of opening external links in a new tab/window may also need to be handled if Kiwix Serve doesn't already have a way of doing this, but it's not difficult and there is some suggested code in https://github.com/kiwix/kiwix-tools/issues/604#issuecomment-1470013026 (NB the part of that code dealing with PDFs should be removed).
@Jaifroid I don't understand the mechanics of your solution but I implemented it in kiwix-serve (though without any host URLs like https://moz-extension.kiwix.org
included in the default-src
list). PDFs are still blocked in chromium. While trying to test your PWA at https://kiwix.github.io/kiwix-js/www/index.html the displayed version is 3.7.2 and no notification about 3.7.3 being ready shows up.
I don't understand the mechanics of your solution
@Jaifroid By above I meant that I don't see why the described setup should enable the chrome PDF extension (that is not same-origin) being loaded in an iframe that only allows same-origin content.
@veloman-yunkan The test implementation was erased yesterday by a push to master of a completed PR. However, I've just restored it, so if you visit that address again, and wait 10s or so, it should offer you v3.7.3 (see screenshot). To load, exit the browser and then re-visit that page. It should then show 3.7.3 and allow you to test.
Regarding setting the headers in Kiwix Serve, how are you doing this? And are you setting response headers only for HTML? I suspect you need to set different headers (or none) for a PDF, but I don't know that for sure without testing empirically.
I don't understand the mechanics of your solution
@Jaifroid By above I meant that I don't see why the described setup should enable the chrome PDF extension (that is not same-origin) being loaded in an iframe that only allows same-origin content.
Because you will no longer be setting a sandbox on the iframe. You will be setting it only in the HTTP response headers.
Here is top-level navigation (English Wiktionary) being blocked by 3.7.3, and below that is a PDF (Greek Gutenberg) showing in the iframe in Edge (Chromium) also on 3.7.3:
@veloman-yunkan The test implementation was erased yesterday by a push to master of a completed PR. However, I've just restored it, so if you visit that address again, and wait 10s or so, it should offer you v3.7.3 (~see screenshot~). To load, exit the browser and then re-visit that page. It should then show 3.7.3 and allow you to test.
@Jaifroid Thanks. The 3.7.3 PWA doesn't load PDFs in the iframe either (when using chromium).
Regarding setting the headers in Kiwix Serve, how are you doing this? And are you setting response headers only for HTML? I suspect you need to set different headers (or none) for a PDF, but I don't know that for sure without testing empirically.
Let's dig into this after you confirm that the PWA works for you using this ZIM file: https://master.download.kiwix.org/zim/.hidden/dev/lilote_fr_test_2023-01.zim
@Jaifroid Thanks. The 3.7.3 PWA doesn't load PDFs in the iframe either (when using chromium).
Works for me in Edge Chromium (see screenshots posted above)... I'll try your ZIM.
Working fine with that ZIM. Are you sure the app is showing 3.7.3 in top-left corner?
Also working in Chrome 110 (previous screenshots were Edge Chromium 111):
Doesn't work with Chromium Version 90.0.4430.212 (installed on Ubuntu 20.04 using these instructions)
OK, I'll see if I can reproduce and debug with Chrome 90 on Windows.
Thx for investogating this in details, but pretty possible that a bug in Chrome has been fixed between versions 90 and 110.
Yes, I've just checked on Browser Stack, and I confirm that the bug is present in Chrome 90, but it was fixed in Chrome 91 (pictured below) and later. Chrome 90 doesn't recognize the PDF loaded as being of origin 'self'.
Yes, it's pretty recent, but it's actually quite hard to stay on Chrome 90, because Google updates it almost as soon as you install it. Of course that wouldn't be the case with Chromium on Linux.
I haven't spent a long time trying to find out if there's a workaround for Chrome 90, but I did try specifying the site (https://kiwix.github.io) as an allowed frame-src and even tried deleting the mtea http-equiv CSP, but the bug persists. The error in DevTools just confirms the block.
Bottom line, unless someone has a specific patch for Chrome <=90's behaviour, it seems to me we have these choices:
In Kiwix JS we decided to adopt 3 (actually version 2 + 3), as it is the most universal solution, even if it doesn't mimic precisely what the original web site developers may have intended. Users are forced to open EPUBs in a separate app, so opening PDFs in a separate tab is not so much of a sacrifice IMHO.
When opening a PDF in a ZIM in the kiwix-serve viewer, on Chrome, there is an unrecoverable error message. This looks like a regression.
Go to https://dev.library.kiwix.org/viewer#lilote_fr_fo_2023-03/xay-va-%C3%A0-la-p%C3%AAche then click the Lire l'histoire button.