Closed JayPanoz closed 3 years ago
There is guidance to this effect under the security considerations section of the Content Documents specification:
Reading Systems need to behave as if a unique domain were allocated to each Content Document, as browser-based security relies heavily on document URLs and domains. Adopting this approach will isolate documents from each other and from other Internet domains, thereby limiting access to external URLs, cookies, DOM storage, etc.
and
If a Reading System allows persistent data to be stored, that data needs to be treated as sensitive. Scripts might save persistent data through cookies and DOM storage, but Reading Systems might block such attempts. Reading Systems that do allow data to be stored have to ensure that it is not made available to other unrelated documents (e.g., ones that could have been spoofed). In particular, checking for a matching document identifier (or similar metadata) is not a valid method to control access to persistent data.
http://www.idpf.org/epub/31/spec/epub-contentdocs.html#sec-scripted-content-security
Well, it looks like guidance might not be enough since there was the same issue reported on the Readium repo in August: https://github.com/readium/readium-js-viewer/issues/559 (iframe not sandboxed).
Now, I would understand something more would be “out of scope”.
Due to various technical constraints depending on target platforms (e.g. cloud / web browser -based reader vs. native app web view), Readium's core "engine" is not always capable to implement totally watertight sandboxing (in Readium's case: HTML resources displayed inside iframes). Content Documents are served from different origins, and not always through HTTP (e.g. custom URL protocols on Chrome extension or Electron or Cordova). In some cases, the reading system can only inject "behaviours" such as media overlays playback, highlights/annotations, etc. into EPUB content when both the app and the content are in the same domain. As for LocalStorage, there's also the inverse problem to "everybody can see my data": in some cases the content URLs' domains vary from one reading session to the next (e.g. random HTTP port number), resulting in a user's recorded data not being persistent (e.g. EPUBs that contain scripts that track some sort of activity progress, or that memorise user preferences).
cc @rkwright
The issue was discussed in a meeting on 2021-02-11
My comment above dates back to a few years ago. I wrote a more up to date analysis for Thorium (iframe, sandboxing, origin, etc.): https://github.com/edrlab/thorium-reader/issues/1375
The issue was discussed in a meeting on 2021-02-18
List of resolutions:
OK so this might be a security issue to some extent.
As far as I know, there’s nothing about “origin” in the EPUB spec.
Why is this an issue?
Because
localStorage
. See https://html.spec.whatwg.org/multipage/webstorage.html#the-localstorage-attribute and https://html.spec.whatwg.org/multipage/browsers.html#concept-originIn other words, Reading System as the origin is valid, which means you can retrieve every item stored in the RS and not only the local storage area for one EPUB file.
Now, at the moment, it appears you can get items set in other EPUB files in some RS. See following screenshot (width was set in one file, we
getItem
using JavaScript in another file)Here, we actually retrieve the whole storage using a loop (every item set in different files before running the script can be accessed)
I must admit I would be much more comfortable if origin = each EPUB file and not the whole RS.
If someone set sensitive data in localStorage at some point, you could theoretically access it from another file and it would be valid per spec.