patcg / docs-and-reports

Repository for documents and reports generated by this community group
Other
9 stars 12 forks source link

Principle: user opt-out should be undetectable #49

Closed martinthomson closed 6 months ago

martinthomson commented 1 year ago

This avoids problems where sites might retaliate or marginalize users who decide not to participate.

We've talked about this lots already, but I don't know if we wrote it down.

michaelkleber commented 12 months ago

We've gotten some pushback about this. Do we have a position on whether it's sufficient to reveal a "this API is not available" kind of signal as long as there are lots of other reasons for an API's unavailability, aside from the user opting out?

This seems analogous to e.g. probes for device capabilities. "No you can't have gyroscope sensor access, either because the user said No or because there just is no gyroscope."

(Oops should discussion be here or on #50?)

dmarti commented 11 months ago

Even partly detectable opt outs (where there might be another reason the API is missing) would likely mean that more users get hassled.

johnwilander commented 11 months ago

We've gotten some pushback about this.

What is the pushback? I'd like to understand the tradeoff.

Do we have a position on whether it's sufficient to reveal a "this API is not available" kind of signal as long as there are lots of other reasons for an API's unavailability, aside from the user opting out?

This seems analogous to e.g. probes for device capabilities. "No you can't have gyroscope sensor access, either because the user said No or because there just is no gyroscope."

I don't think it's analogous. As we've talked about in PATCG meetings, these features are not for users but for site owners and advertisers. Hence, the incentives to push/harass users about their choices are very different than when a function intended for the user is turned off.

michaelkleber commented 11 months ago

This question or something similar to it has come up in a few different contexts. Maybe aggregate measurement is sufficiently different from those other use cases, in a way that makes this principle OK here even though it doesn't fully generalize. Anyway, I'll try to list the analogues we've run into, and I welcome discussion.

  1. Incognito / Private Browsing mode. When a person chooses to visit a site in this mode, there is of course risk that the site will say "Nope sorry you may not visit in incognito mode, please use regular browsing mode instead." Chrome wanted to block 3p cookies in incognito, but we worried that this would make the incognito choice detectable. After internal debate we did decide to default incognito to 3p-cookie blocking, but part of the decision was "There are enough other reasons 3p cookies might not be available that this wouldn't really enable incognito detection."

  2. Ramp-up debugging. During the period where Chrome is rolling out the new Privacy Sandbox APIs for on-device ad selection, the new APIs might not yet be available everywhere, for a long list of reasons: only half of people are eligible, a site needs to set a permissions policy, a user needs to have seen some notification, etc. For users who have opted out, we have sometimes treated them like the many other reasons an API might not be available, so that feature detection would fail. This could risk retaliation if it were likely that API absence was caused by a user action, but if there are lots of other reasons as well, this may be OK. People trying to use the API would benefit from being able to tell the difference between "The API is inexplicably broken x% of the time" and "The API is deliberately unavailable y% and inexplicably broken z%", where x = y + z.

  3. Avoiding expensive operations. We have cases where an API's existence triggers adding some HTTP header, which means some additional network bytes that are useless. In this case I agree with the principle as written, and I think we should just send the header even for opted-out users. But I'm worried about future cases where the thing we need to do to maintain the fiction of the API being available might be more expensive than tens of bytes on the wire. For example, if the party calling the aggregation API wants to aggregate some expensive-to-compute value, they could roll a d20 and only compute it 5% of the time Both the API caller and the user might be happier with a browser-supported API extension that lets the caller ask "Please tell me if I should compute this expensive value with p=0.05", and which was very likely to tell them to not compute the value if the reporting API was turned off. (Maybe there is a DP version of this which satisfies "should be undetectable"?)

None of these is a slam-dunk case; feel free to tell me to go home. But I'm a little wary of over-generalizing this undetectability principle to cases where it imposes other costs on the user.

martinthomson commented 11 months ago

No principle that is worth anything is 100% perfect, but I'd like to keep this one. For the reasons Michael states and more. People should not be discriminated against for choosing to exercise their right to opt out.

martinthomson commented 11 months ago

Oh, I should have mentioned also... The means by which we make something undetectable or whether opt out is merely indistinguishable from other normal and common situations is something that we can probably negotiate based on circumstance. As you say, there are cases in existing Privacy Sandbox designs where you potentially get that latter, weaker protection, but that would still mostly be OK.

dmarti commented 11 months ago

Undetectable, not just indistinguishable, would be safer -- many sites will likely be willing to impose extra work or data collection on both opted-out users and on a small fraction of users who have not opted out but get detected as possibly opted out. (If a data point is worth doing expensive operations to process, then for some sites it's also going to be worth risking driving away a fraction of users in order to get it)

michaelkleber commented 11 months ago

OK great, thanks folks. Martin, your suggested addition to #50 addresses this nicely. Don, certainly undetectable is the better option, but as usual we sometimes need to make trade-offs.

dmarti commented 11 months ago

@michaelkleber What's the trade-off, though? From the user point of view, if undetectable is required, I get (1) optionality on whether to use the system and (2) better UX with fewer hassles--it's a win-win.

michaelkleber commented 11 months ago

The point of https://github.com/patcg/docs-and-reports/issues/49#issuecomment-1799789987 was showing several cases where there would be trade-offs for making a user setting undetectable.

dmarti commented 11 months ago

For a user, there's no benefit to a probabilistically detectable opt-out over a completely undetectable one.

It's not worth having users randomly interrupted at unpredictable times just to make some debugging sessions easier for some developers.

(and this will probably go both ways -- some sites will pop up "user tracking keeps this site free, log in or turn it on" and other sites will say "big corporations are surveilling you, turn it off" -- meanwhile most users would prefer to avoid the drama from either direction)