Motivation and/or threat model for data minimization

yoavweiss commented 10 months ago

One question that came up when discussing data minimization and potential mitigations for exposing ancillary data is: "why should we minimize data exposure?"

Knowing what the threat we're protecting against is generally useful when thinking through mitigations for that threat.

I think it'd be useful to clearly state why we need to minimize data exposure, as well as explaining that different types of data may have different threat models:

Some types of data are sensitive on their own and expose something particular about the user (e.g. accessibility preferences)
Other types of data are not sensitive, but can contribute to the overall fingerprint (e.g. system load)

That can help WG that think through ancillary data exposure and potential mitigations.

dmarti commented 10 months ago

We cover the issue of sensitive information in 2.4.

System designers should not assume that particular information is or is not sensitive. Whether information is considered sensitive can vary depending on a person's circumstances and the context of an interaction, and it can change over time.

It's not really possible to tag the data itself as sensitive/non-sensitive. It depends on context ("occupation" might be non-sensitive for a bank manager on a job listing site, but sensitive on a fanfic site) and the ability of parties to make sensitive inferences from apparently non-sensitive data points.

yoavweiss commented 9 months ago

I'm not suggesting we should explicitly tag data, but we should have some threat models that we're trying to defend against. Otherwise, we will not be able to make any reasonable trade-offs or make the wrong ones.

jyasskin commented 9 months ago

In general, we minimize data transfer because any unnecessary data is a risk. But I realized that the current principle doesn't actually talk about what data is necessary, so I've sent https://github.com/w3ctag/privacy-principles/pull/382 to try to improve that. Once it gives a target for how much to minimize, does that help with this issue?

Beyond that, I think the first paragraph after the principles is probably the best we can do:

Data minimization limits the risks of data being disclosed or misused. It also helps user agents and other actors more meaningfully explain the decisions their users need to make. For more information, see Data Minimization in Web APIs.

It's not a precise threat model because data and data sensitivity are so varied.

If we were to try to describe a framework for identifying sensitive data, it would go in https://w3ctag.github.io/privacy-principles/#hl-sensitive-information, but as you can see in that section, the group hasn't supported identifying particular data as less-sensitive than other data.

jyasskin commented 9 months ago

I think this is ready to close, now that #382 is in: https://w3ctag.github.io/privacy-principles/#data-minimization.

npdoty commented 8 months ago

+1 to jyasskin on the reasoning for minimization, and I believe that's the consensus of the task force (13 December 2023).

Just to clarify, there are still important potential differences in the sensitivity, we just note that those things are not inherent in particular data types, and often vary by person or context. That's part of the justification for general data minimization.

w3ctag / privacy-principles

Motivation and/or threat model for data minimization #370