Sharing identity and reading history aren't purposes

jyasskin commented 1 year ago

@michaelkleber points out that https://w3ctag.github.io/privacy-principles/#opt-in-out says

In specific cases, people should be able to consent to more sensitive purposes, such as having their identity recognised across contexts or their reading history shared with a company.

But there are many purposes for which one might recognize someone across contexts or share reading history with a company, and not all such purposes are "more sensitive". The document should pick out some particular purposes that need consent here.

darobin commented 1 year ago

I think that there may be easier fixes here:

should be able to [consent](https://w3ctag.github.io/privacy-principles/#dfn-opt-in) to more sensitive **processing**, such as yada yada
should be able to [consent](https://w3ctag.github.io/privacy-principles/#dfn-opt-in) to more sensitive [purposes](https://w3ctag.github.io/privacy-principles/#dfn-purpose), such as **those that require** having their yada yada

I don't think anyone likes it when we list purposes ranked by sensitivity?

jyasskin commented 1 year ago

The text about

The burden of proof on ensuring that informed consent has been obtained needs to be very high in this case.

is also an issue: this isn't a court case that has burdens of proof, and the possible purposes vary in how much the person ought to be informed before letting them consent. E.g. the purpose of "showing you pages you read recently" probably needs very little informing, while the purpose of "identifying people who are too woke to deserve a bank account" needs more.

michaelkleber commented 1 year ago

Suppose I read news articles on a site that only lets me read 5 per month without subscribing, and they remember which ones I've already read to enforce this cap. Certainly a company is logging some history of what I've read, and there is some processing taking place. But I don't think there is a particularly elevated sensitivity, a very high burden of proof, or anything going on that relates to opt-in vs opt-out.

As Jeffrey said in the title of this Issue, the elevated sensitivity seems to be more about the purpose than about the data. And the first graf of this section is indeed about purposes, while the third graf has pivoted away from that.

jyasskin commented 1 year ago

Today's discussion pointed to changing the text to something like

In specific cases, people should be able to consent to data sharing that would otherwise be restricted, such as having their identity or reading history shared across contexts.

darobin commented 1 year ago

@michaelkleber I don't think that that's a correct data protection assessment. Data processing is treated as sensitive because of what it enables, not because of what is actually done. Say I collected the detailed biometrics of billions of people to make a beautiful work of art. Surely my purpose is innocuous, maybe even virtuous — but I'm still sitting on an insanely dangerous pile of data.

A site that keeps full history is keeping sensitive data. If it's only using it for paywall purposes, it should find a way to minimise that collection, get rid of it after a month, etc.

The purpose matters with respect to the person's preferences. Maybe you don't want to give me your DNA because you think my art sucks. But a purpose cannot be sensitive if the data collection isn't sensitive to start with. Much of the time, the agent cannot assess purposes but can assess some degree of risk.

lknik commented 1 year ago

@michaelkleber I don't think that that's a correct data protection assessment. Data processing is treated as sensitive because of what it enables, not because of what is actually done. Say I collected the detailed biometrics of billions of people to make a beautiful work of art. Surely my purpose is innocuous, maybe even virtuous — but I'm still sitting on an insanely dangerous pile of data.

If we'd want to ground the discussion on some widely-accepted norms, we could cite the GDPR about 'processing: "‘processing’ means any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction"

What you refer in the above (what it enables) is the purpose.

darobin commented 1 year ago

@lknik Yes, I understand the distinction, and we do have processing defined. Processing is sensitive based on what it enables: yes, that is purposes, but not necessarily the stated purposes at the time of processing. A DPIA isn't based on "I'm making a piece of art" but rather on "I'm collecting massive amounts of biometrics".

I also forgot in the previous comment, @jyasskin: the mention of burden of proof here isn't about court cases. When you collect more sensitive data, common data protection practice is to carry out assessments (eg. like a DPIA, or Data Protection Impact Assessment) in which you establish (for yourself as part of your threat modelling but also potentially for compliance where applicable) what the risks are and whether you have sufficiently mitigated them. These would likely mention purpose (that's how you establish that your processing is proportional) but the focus is on risk. Whenever you collect data about people, you are taking risks for them. The more data, the more detailed, the more cross-context the more dangerous. The danger isn't from what you're planning to do with it or from what you're doing now; it's from what a bad actor could do with it (from a security breach, but also because the political situation of your country turned sour, you were acquired by an evil billionaire, a regulator forces you to sell those assets, etc. — the point of impact assessment is that shit does in fact happen, not if but when).

michaelkleber commented 1 year ago

A site that keeps full history is keeping sensitive data. If it's only using it for paywall purposes, it should find a way to minimise that collection, get rid of it after a month, etc.

I completely agree! But this is precisely my point: something like "only keep the count of unique readers" or "get rid of it after a month" is still processing of that very same sensitive data, but for a highly privacy-beneficial purpose.

lknik commented 1 year ago

A site that keeps full history is keeping sensitive data. If it's only using it for paywall purposes, it should find a way to minimise that collection, get rid of it after a month, etc.

I completely agree! But this is precisely my point: something like "only keep the count of unique readers" or "get rid of it after a month" is still processing of that very same sensitive data, but for a highly privacy-beneficial purpose.

It also sounds great! I hope that the principles/tag has the necessary remit to advice about it, as it is potentially about 1) back-end, 2) perhaps subject to legal constraints.

And indeed, keeping track/removing is also processing.

jyasskin commented 1 year ago

I think #242 fixed this. Please re-open this if we got it wrong somehow.

w3ctag / privacy-principles

Sharing identity and reading history aren't purposes #229