w3c / epub-specs

Shared workspace for EPUB 3 specifications.
Other
303 stars 60 forks source link

[TAG] Personal/sensitive information collected by reading systems #1957

Closed rhiaro closed 2 years ago

rhiaro commented 2 years ago

Having reviewed the responses to the security & privacy questionnaire, we would appreciate some stronger language and a more proactive stance around mitigating potential harms. We appreciate that implementation details of reading systems may be out of scope, but the Security and Privacy Considerations sections of the specifications can nonetheless be used to encourage good behaviour, and raise awareness of things that implementors might not otherwise have considered.

For example, the response mentions:

Some reading systems track every user reading session, including the time of day, the duration, how many pages were read, what book, the user’s IP address if a web-based reader, etc.

But then says:

The EPUB specifications do not mention personal information.

and

the fact that you are reading a particular book can itself be sensitive information. The specification offers no guidance to reading systems on this matter.

This is an opportunity for the specs to mention PII and sensitive information, with regard to discouraging collection and/or transmission over the network in the first place, or to notify readers of what data is being collected and why (and, ideally, offer them a chance to opt out).

iherman commented 2 years ago

The issue was discussed in a meeting on 2022-01-14

View the transcript ### 1. TAG's Privacy and Security comments. _See github issue [epub-specs#1957](https://github.com/w3c/epub-specs/issues/1957)._ _See github issue [epub-specs#1959](https://github.com/w3c/epub-specs/issues/1959)._ **Dave Cramer:** that was the main concern of PING of TAG, that we have not said much about the threat model involved in epub, about handling PII. … our goal is to address these. … this first issue is about PII, could we do something to discourage collection of PII, can we recommend RS are clear with customers about what is being collected?. … wendyreid wrote a summary of the issues for us, and some of our possible responses. > *Dave Cramer:* See [First draft by Wendy](https://github.com/w3c/epub-specs/wiki/Privacy-and-Security-for-EPUB3). **Dave Cramer:** the industry has seemed to settle on policy of user agreements, but the general public is probably not aware of how much information in being collected in this process. **Ivan Herman:** just to make it clear from the w3c point of view is that we need to provide a section that documents problems and guidelines for what authors and RS should do to address the problems. … there is no requirement to _change_ the spec. > *Avneesh Singh:* +1 Ivan. **Wendy Reid:** i can give an overview. … the way I have written this is that because we are talking about privacy and security, there are two parts to each of content authors and rs. … for both security and privacy, i wanted to lay out our objectives. … preserve confidentiality, content integrity, transparency. … threat modelling for content: falsification of creator information, remote resources, etc.. … recommendations (mostly around privacy aspect): protect users from threats, avoid collection of data, "content processors" should be careful. … for RS threats and recommendations are also set out, with threats like rs spoofing. **Dan Lazin:** from disclosure point of view, Apple now requires that apps have a user data collection "nutrition label" in the app store. … not all rs are apps (i.e. some are stand alone devices), but this means that there are already some higher level requirements in many cases. … but also, the most common RSes come pre-installed on device, not via app store. **Wendy Reid:** other considerations like CCPA also come into play here, but these recommendations apply to. **Dan Lazin:** we could model our response based on apple's list of things they do with user data. **Rick Johnson:** the more specific we get with the current state of things, the more revs of this we will have to do. … so we should err on the side of common sense, and maybe reference other places that deal with this sort of thing, but not get too specific. **Brady Duga:** we should focus on epub, even though rs do lots of things that don't specifically have to do with epub format. … odd for an epub format spec to try to tell rs what to do with other formats, or as a UA generally. > *Wendy Reid:* See [PING Target Privacy Threat Model](https://w3cping.github.io/privacy-threat-model/). **Brady Duga:** also, the rs privacy policy doesn't apply to the publisher - e.g. if a publisher includes a tracking pixel, rs can't control that. **Dave Cramer:** the industry has gotten in trouble before - e.g. ADE sending unencrypted user information back to Adobe. … i looked up the policies of a few major epub retailers. … e.g. Apple says they anonymize everything, but Kobo doesn't. … so I'm a little less concerned with how other specs handle privacy, because there are specific user expectations about privacy when it comes to books. > *Dave Cramer:* See [Privacy section of the HTML webstorage section](https://html.spec.whatwg.org/multipage/webstorage.html#privacy). **Ivan Herman:** we have 2 specs, content and rs. And wendyreid separated the threat model into these 2 parts.. … in the rs part, we already say things about origin and other security related policies, so we aren't complete silent. … indeed there are areas where spec is silent, but the general expectations that a user should have over privacy are probably in scope. **Wendy Reid:** one thing that separates us from general web browser is that there are book related affordances that RS are expected to do for user, but some of these rely on collection of user data. … but users don't think this way, they just expect that these features are there. … e.g. collection of data for annotations, could syncing. … so we would be doing our due diligence by providing some guidance to implementers. … the reality is that these recommendations aren't normative anyway, but we are being good global citizens by doing so. **Rick Johnson:** as a reminder, the epub marketplace is also the dominant format for education, an the customer is not the user. It's the institution.. … so we need to be careful when we say things that affect that use case. **GeorgeK:** there are also rs that track the individual student and how much time they spend reading, progress, etc. and report back. … and many times teachers and parents can see that info. … i wonder if one of our suggestions would be to have the privacy policy available in a rs, e.g. in the help section, or if there is anything in the content that is phoning home, then it would be the publisher who informs people about that. **Tzviya Siegman:** we need to keep in mind that we tend think of epubs as separate from the web, but there are a lot of websites that do similar things. We're not that different. … but the "nutrition label" might solve the problem, by clarifying the user's position without scaring them. … better than a user agreement, where user knows that they just have to click to agree or else the app won't work. **Ivan Herman:** UX people have come up with a vocab that describes the a11y issues that might be present in a given book, can we do something similar?. … but most of the privacy issues are on the rs side rather than the content side, so it may not be that helpful. … since wendyreid has already started, I think this text should become part of the spec. … so next step should be to open a PR to incorporate it. … re. applications disclosing privacy features in general, maybe we should incorporate the labels that we want RS to provide. **Wendy Reid:** I considered including specific examples of data collection behaviours, or how to communicate to users when these things happen. … but I thought that being specific might lead people to think that the examples constitute a closed list, when the recommendations are more like principles. **Dan Lazin:** do we know what the TAG wants? or can we ask?. … most specs don't touch this. Really we just want to satisfy them.. … can they give us an example of what other specs have done in response to similar concerns. Rather than producing something on our own and risking it not being what they want. > *Ivan Herman:* Examples for privacy section in the DID spec: [https://www.w3.org/TR/did-core/#privacy-considerations](https://www.w3.org/TR/did-core/#privacy-considerations) and for security section: [https://www.w3.org/TR/did-core/#security-considerations](https://www.w3.org/TR/did-core/#security-considerations). > *Wendy Reid:* [https://www.w3.org/TR/audiobooks/#security-privacy](https://www.w3.org/TR/audiobooks/#security-privacy) <-- quite minimal. > *Wendy Reid:* [https://www.w3.org/TR/pub-manifest/#security-privacy](https://www.w3.org/TR/pub-manifest/#security-privacy) <-- more informative. > *Tzviya Siegman:* Web Authentication has several sections on privacy [https://www.w3.org/TR/webauthn-2/#sctn-privacy-considerations-authenticator,](https://www.w3.org/TR/webauthn-2/#sctn-privacy-considerations-authenticator,) [https://www.w3.org/TR/webauthn-2/#sctn-privacy-considerations-client,](https://www.w3.org/TR/webauthn-2/#sctn-privacy-considerations-client,) [https://www.w3.org/TR/webauthn-2/#sctn-privacy-considerations-rp](https://www.w3.org/TR/webauthn-2/#sctn-privacy-considerations-rp). **Tzviya Siegman:** privacy is increasingly important now. We're not just doing this to check off the privacy review box. **Ivan Herman:** the DID spec recently addressed similar privacy concerns. … i.e. what implementors should be aware of when they try to implement, what privacy pitfalls are they likely to encounter, etc.. … we also went through a similar process with audiobooks and pub manifest. … i don't quite agree that we're just doing this to satisfy TAG. **Dave Cramer:** I can reply in the issues with a link to the document that we already have?. **Ivan Herman:** we'll let them know once we have a PR. **Hadrien Gardeur:** we have very very different rs, and for some of them, the fact that rs need to be distributed already applies some requirements (e.g. apps distributed via app store). … similar thing will happen with Play store. … but no analogy process on the web - maybe just a privacy section of the page. … we could have best practices section about what to disclose to users, but not sure we can go much further than that. **Ivan Herman:** i wonder whether there are things specific to security that we need to call out. … we know most of rs have been quite averse to using scripts, some don't allow it at all. … mostly due to security concerns. … so having a fairly good idea of why rs shouldn't allow scripts might be helpful. … maybe say that content authors should really consider whether they need to include scripts in their content. **Wendy Reid:** i think the best we can do is identify some common threats that arise because of the way the spec is written, and the way content is likely to be written. … in terms of recommendations, we could recommend virus checking as part of ingest, checking origin or links. … security is tricky because we can make recommendations, but it will ultimately come down to the authors. **Dave Cramer:** one of the big problems with security is that Hachette might write the script, but then Google executes it, knowing nothing about it. **Ivan Herman:** you could say that a content creator on the web writes scripts, and then the browser has to execute it. … but for ebooks, once you put something in content, those books won't be automatically updated. … so a malicious script could stay other there for a very long time, whereas on a website, the content can be updated. … the fact that the book becomes its own entity is a difference between book and website - might be worth pointing out, as a reason not to include scripts. … e.g. old versions of incorporated js libraries incorporated in ebooks. **GeorgeK:** what a user is reading is certainly private. Governments knowing what people are reading is something we should point out. … whether or not people are using assistive tech is also sensitive, we should call that out. **Rick Johnson:** perhaps we don't bifurcate by content and rs, but rather content creation and distribution channel. **Charles LaPierre:** the other thing is that the content creator could have done everything right, and then someone in the middle injects malicious code into the epub and repacks it. … right now we don't have anything to do with signing ebooks, etc.. … so it really falls onto whoever is ingesting this at the end to make sure the content is safe. **Hadrien Gardeur:** the top reason why rs don't like js in content is that it can mess with what rs does. … rs will most likely always inject js to get desired result, which js in content can mess up. **Brady Duga:** we don't do js because of security, and because it's a pain. When a rs implements something in the webview they are limited in the resources they have available. But a browser operates at a higher level. … in terms of serving content, when Hachette writes a book with a script, it's different whether that script is run via rs or via browser. When run via rs, the origin is Google.. … very different security and privacy model. **Ivan Herman:** about signatures on epubs, there could be. Would it be a good thing to say in spec that we suggest that publishers make sure of signatures?. … Of course, all the intermediates that modify content during ingestion will be against that, but there are all sorts of technology out there for providing signatures. **Zheng Xu (徐征):** i was talking in mini-app CG yesterday, and their zips have a signature. … maybe we can incorporate something similar. **Hadrien Gardeur:** we have signatures.xml in OPF, but how to use it was not defined. **Ivan Herman:** agree that some infrastructure is already there, but probably too early to try to standardize anything. … but we can recommend that industry start to look into using it, which we might try to standardize in a future revision of epub. > *Charles LaPierre:* +1 to looking into signing of the EPUB maybe a Community Group task to investigate?. **Hadrien Gardeur:** i think its more likely to happen if a major player in the industry starts requiring signatures - it's a chicken and egg situation. **Zheng Xu (徐征):** for epub nobody really understands how to use signature. … once use cases are clearer, then we can start to standardize. > *Bill Kasdorf:* should we be incubating signatures in the CG?. > *Zheng Xu (徐征):* Bill_Kasdorf_: yes I feel like so. **Dave Cramer:** sounds like we have stuff to do. … polish wendyreid's text and get it in PR. … notify TAG re. same. … research re. use cases of signing epubs. … (alternatives to what we already have with XML signatures?). … and hopefully we can help wendyreid with PR. **Wendy Reid:** i think mgarrish is working on the PR piece, but we can help if he needs us to. **Dave Cramer:** okay, thanks everyone, we'll see you next week!. ---
iherman commented 2 years ago

The issue was discussed in a meeting on 2022-04-08

List of resolutions:

View the transcript ### 1. Close Privacy & Security Issues. **Dave Cramer:** the TAG has reappeared of making a couple comments, I am making a PR to mention that when using web APIs, which have the most dramatic privacy and security implications (geolocations, push notifications) then you should get user consent. _See github issue [epub-specs#1959](https://github.com/w3c/epub-specs/issues/1959)._ **Dave Cramer:** we have several issues where there was never much discussion in the issue (#1959 for example). … I think the PR i mentioned earlier would serve to close this issue. … agree/disagree? **Ivan Herman:** we had a lot of discussion with PING, good discussions, after which we made extensive additions to answer the issues they raised. … and we contacted them several times to get their acknowledgement. So at this point we consider these issues closed.. … they have the right to reopen issues if they like. … Amy from TAG has closed the issue of epub review on the TAG repo, so that is an indication of how they feel. **Gregorio Pellegrino:** so is this passed? it is okay? _See github issue [epub-specs#1872](https://github.com/w3c/epub-specs/issues/1872)._ **Ivan Herman:** yes, it is okay. **Dave Cramer:** risk of exposure and finger printability. … this was raised before we clarified the threat model, can we close this now? _See github issue [epub-specs#1873](https://github.com/w3c/epub-specs/issues/1873)._ **Dave Cramer:** obfuscation, which we've discussed extensively, followed by updates to the spec docs. _See github issue [epub-specs#1875](https://github.com/w3c/epub-specs/issues/1875)._ _See github issue [epub-specs#1876](https://github.com/w3c/epub-specs/issues/1876)._ **Dave Cramer:** interactivity, which we've addressed as best we can given that it's ambiguous. … self-contained packages, this is a case where its appropriate to close because epub is clear that it is largely self-contained, subject to exceptions enumerated in the spec. Not dramatically impacting privacy. _See github issue [epub-specs#1957](https://github.com/w3c/epub-specs/issues/1957)._ **Dave Cramer:** we enumerated the threat model, which deals with #1957. _See github issue [epub-specs#1958](https://github.com/w3c/epub-specs/issues/1958)._ **Dave Cramer:** permission prompts, we're dealing with this, strengthened text. _See github issue [epub-specs#1959](https://github.com/w3c/epub-specs/issues/1959)._ > **Proposed resolution: Close remaining privacy and security issues.** *(Wendy Reid)* **Dave Cramer:** broad user expectations issues, which is covered by the other changes we've made. > *Ivan Herman:* +1. > *Matthew Chan:* +1. > *Shinya Takami (高見真也):* +1. > *Bill Kasdorf:* +1. > *Dave Cramer:* +7. > *Wendy Reid:* +1. > *Matt Garrish:* +1. > *Murata Makoto:* +1. > *Dan Lazin:* +1. > *Charles LaPierre:* +1. > *Ben Schroeter:* +1. > *Masakazu Kitahara:* +1. > ***Resolution #1: Close remaining privacy and security issues.*** > *Ivan Herman:* clap, clap. **Dave Cramer:** I think the spec is now much more informative/clear about some of these issues, so thanks everyone. > *GeorgeK:* +1.