Open jyasskin opened 4 years ago
I've added a comment https://github.com/WICG/crash-reporting/issues/1#issuecomment-571289525 explaining why I strongly disagree with the premise of Pete's issue. I quite agree that the Threat Model should discuss this sort of API, but based on everything in the model so far, the clear conclusion is that this is not a threat.
Based on @snyderp's https://github.com/WICG/crash-reporting/issues/1#issuecomment-571300343, it sounds like the harm would be something along the lines of "unwanted use of client resources". I do think that's a real harm, but
1) I don't see how it fits into the high-level privacy threats described in RFC 6973. If it doesn't, is it really a "privacy" harm? 2) What can we say about how to trade off this harm vs API benefits? Something about the amount of resources should go into the decisions, but the reporting APIs mentioned above seem like they'll use much smaller amounts of resources than, say, images, most Javascript frameworks, or advertisements. Assuming that server-chosen images are here to stay, I'm having a hard time inventing a coherent target for the threat model that excludes the requests made by the proposed reporting APIs.
I agree that there are huge problems with "unwanted use of client resources" — the current prototypical example is sites or ads that mine bitcoin in your browser. So I'm all in favor of a statement of principles that sees value in reducing client resource usage.
From that point of view, an API like https://github.com/w3c/IntersectionObserver, which saved a measurable percentage of total web browser battery use, is a great win. But I'll note that none of the PING discussion of that API has been on this aspect, and indeed has been quite hostile to that point of view when I've raised it.
This reinforces the conclusion that the threat we're discussing is not actually one of "privacy".
There are two related, but distinct concerns. The "unwanted use of client resources" issue is tangental.
@jyasskin RFC 6973 shouldn't have the final say on such things, but this falls clearly in "6.2. User Participation" (among others). A user visits a website to achieve a user goal. None of that is related to "I want to help the site debug its application".
Ah great, glad to get back to the core issue: "other parties can learn new things about me they can't currently learn."
What is it that these APIs allow learning about you?
I think we all agree that they let people who write code learn about how that code fares in the real world, and I guess we can just disagree about whether a user's post-OOM experience is "I couldn't accomplish my goal today, but I would still like to be able to accomplish it tomorrow." But until we relate the API to information about the user, I don't see the privacy angle.
Setting aside the other issues being discussed above and the parallel WICG issue, honest question, have you read section 6.1 of RFC 6973 and 6.2. It does a good job explaining (part of) the concern here.
Do you disagree that this proposal is contrary to sections 6.1. (and 6.2, among others) or do you disagree these are useful floors for thinking about privacy?
I like that RFC a lot! My reading of §6.1 in this context is:
Data minimization can be effectuated in a number of different ways, including by limiting collection, use, disclosure, retention, identifiability, sensitivity, and access to personal data.
The kind of data we're talking about, like the "Did this crash come from an OOM?" bit, is not "personal data". The RFC definition (in §3.2) says that personal data is "Any information relating to an individual who can be identified, directly or indirectly." The reporting here doesn't provide a way to tie the report to an individual. And indeed §6.1 says
However, the most direct application of data minimization to protocol design is limiting identifiability. Reducing the identifiability of data by using pseudonyms or no identifiers at all helps to weaken the link between an individual and his or her communications.
That's exactly what the monitoring API does, by design.
I think you skipped over the main point! The very first item in 6.1 is "Data minimization refers to collecting, using, disclosing, and storing the minimal data necessary to perform a task". This text only makes sense if we're discussing tasks the user wants to accomplish, not other parties (considering 6.1 alongside 6.2, emphasizing user consent, control and information, makes this even more plain). To understand "task" in this text as "any given task" would render the text meaningless (e.g. "we're using the minimal amount of data needed to cross site track" is not a meaningful privacy protection). Sending crash reports to unknown (to the user) parties is not related to the task the user intended to perform, and so does not meet the privacy principals in that RFC.
Whether or not the spec sends minimal information for a task unrelated to the goal the user is trying to perform is not relevant (at least to the concepts of privacy described in that RFC).
If the claim is "debugging the site is a task all users intend to perform", that seems… extremely unlikely, and worth explicitly asking about.
The user wants to perform a task. And indeed they tried to do so. And failed!
So the OOM failure is extremely clear evidence that the developer did not already have enough data to enable the task the user just tried to do.
You're conflating things. If I want to drive on a road, and I can't because the road is full of potholes, its not a sign that I want to help fill potholes, its a sign the people maintaining the road are doing a bad job. Likewise, if I visit a site and the site is busted, its not a sign I want to help fix the site, its a sign the site builders have not finished their task. The way to distinguish is to ask.
Let's flesh out the analogy.
You want to drive on a road. The road is full of potholes. For most people in the world, those potholes cause their car to bump up and down, and it's fine. Your car has the precise resonance frequency so that the potholes cause it to fall apart. (Every road has such a car.)
The people who make the road have heard rumors of some cars having problems, so they want to set up a camera that watches for where cars fall apart, so they know what to fix. They want to take reasonable steps to protect privacy: the camera is built so that it cannot record license plates or driver or even car color, just where a car fell apart.
Nobody is asking you to help fill potholes, just to let the road owner look for where they do damage.
You are advocating for a switch in the glove compartment that says "Make my car visible to pothole damage monitors." That is a way to ensure that most potholes remain in place and driving is worse for everyone.
Overall, I'm trying to identify principles that this document can explain, so that API developers can apply those principles to new APIs without the PING's involvement. It's absolutely true that RFC 6973 is not the final word on all privacy principles, and we can add new principles in this document if we think it's missing some.
In RFC 6973, section 6 is about ways to mitigate privacy harms. It doesn't claim that designers need to apply it in cases where there isn't a privacy harm. However, there might be some implicit privacy harms we could extract from section 6 that the authors didn't realize needed to be explicitly listed in section 5.
So, let's look at 2 things that @snyderp mentioned to see if we need to add a new principle to the threat model's high-level threats section:
I think that information about my system is always potentially information about me. Every time you pick up a smidge of information, you know that much more which could help you recognize me even when I take steps to conceal other identifiable characteristics. Learning about a person's environment always helps you track them.
@michaelkleber I think this analogy might have taken on a life of its own. But if your suggestion is that pervasive car monitoring is privacy preserving, ya dun goofed. Better to leave road/site maintainers with the responsibility for debugging their stuff, and let others volunteer info if they want to.
@jyasskin I second @tomlowenthal's comment, and I don't think focusing on “Crash Reporting API” is the best basis to bang out PING privacy principles, but the short of it is that the API shares information about the user’s experience and environment (which is unavoidably about the user) w/o user consent, knowledge or expectation, and thats a problem.
The larger issue about whether "all data is fair game for sites to collect unless there its immediately, one-hop useful for identifying the user" is comparable with user-respecting, privacy-by-default system design seems better to hash out in its own issue / in a PING call / etc.
There was some ambiguity in what I wrote about a user's environment, so I want to distinguish a couple different kinds of facts a server might learn to see if we can pinpoint where we disagree:
I see these as having different privacy implications:
There may be some implicit assumptions that (3) is unachievable, but it's at least achievable by routing the request through Tor.
@snyderp reported https://github.com/WICG/crash-reporting/issues/1, https://github.com/WICG/deprecation-reporting/issues/1, and https://github.com/WICG/intervention-reporting/issues/1 saying that sending debugging information to websites is a privacy harm. Whether or not that's a consensus position, this document should discuss it.