privacycg / nav-tracking-mitigations

Navigation-based Tracking Mitigations
https://privacycg.github.io/nav-tracking-mitigations/
31 stars 14 forks source link

Is "wbraid" tracking? #11

Open benjaminsavage opened 2 years ago

benjaminsavage commented 2 years ago

Taken from this link.

"Q: How does this new parameter work? A: Similar to GCLID, this new parameter is dropped in a 1st party cookie set by A Google Tag (including gTag.js, gtm.js, and analytics.js linked to an ads account) when a user lands on a page after clicking on an ad. This parameter helps attribute conversions back to ad campaigns but does not uniquely identify the user."

dmarti commented 2 years ago

For practical purposes, yes. From the point of view of the user agent, it would have to be treated as a tracking identifier.

wbraid uses aggregation techniques to ensure ATT compliance

In order to handle wbraid (or gbraid) as something other than a tracking identifier, the user (and their user agent acting in their interest) would have to learn the math behind the "aggregation techniques" and enough of the developer terms and conditions to understand what is being claimed by "ATT compliance."

jyasskin commented 2 years ago

Along the lines of #10, I think we should be careful to distinguish things that are/aren't "actually" tracking from things the user agent is "expected to believe" are/aren't tracking. I think we don't yet know enough about all the plausible mitigations to be confident about how wbraid would "have to be treated", so we probably can't have a useful conversation on that front yet. However I did hear some interesting disagreement on the "actually" front at last week's meeting.

The current definition limits "tracking" to things that "identify that a user on one site is the same person as a user on another site." I believe that definition says wbraid is not tracking.

However, @johnwilander said it is "Important to not just talk about linking identity across two sites. If one site learns something new about its user that happened on another website, that should be included in the definition." I think wbraid would be tracking under that definition, as would a lot of the private-ads APIs that have been discussed. So it's worth continuing to discuss what we want to include in "tracking".

If this distinction turns out to be too-persistently contentious, we could also define two different terms, like "identity tracking" vs "information tracking".

dmarti commented 2 years ago

The wbraid identifier is an interesting example because the impact on the user depends on actions ("aggregation techniques") taken by the link destination site after the browser has to decide whether or not to apply mitigations. So wbraid would be "identified navigational tracking," as suggested in #10, whether or not the full stack of link source+user agent+link destination is able to track the user.

The idea of "identified navigational tracking" is also useful in the event that the two sites are tracking in some situations but not in others. For example, a destination site might discard or post-process the identifier if it detects that the user is in a jurisdiction with some kinds of regulations, and persist the identifier if the user is in another jurisdiction.

benjaminsavage commented 2 years ago

@jyasskin - would you be willing to share some implementation details about "wbraid" to justify this claim that it is "ATT compliant"?

From an outsider's perspective, "wbraid" is a high-entropy identifier, that appears to be different for every user. As far as I know, there exists no public documentation of exactly how this is computed and used, and even if such documentation were to exist, can a 3rd party validate that the current implementation matches that documentation?

As such, the user-agent has nothing to go on to assure itself that this cannot be used to track unique users aside from a promise by Google: "trust us, it isn't used that way.".

In practice, how would we productionize a user-agent based mitigation system based upon public commitments like this? That doesn't seem scalable. Would there be some kind of at-scale review and audit system to vet every param from every ad-tech vendor that they claim uses "aggregation techniques" and is "ATT compliant"? This seems unrealistic to me.

Perhaps I'm mistaken. Maybe "wbraid" is documented somewhere, and an independent 3rd party can vet aspects of it without using advanced cryptographic techniques. I'd love to learn more.

jyasskin commented 2 years ago

As someone who works on Chrome, I don't know the direct answer to Ben's question, but what I know is explained in this help center article.

It’s a very interesting question about how browsers should engage with high-entropy values inside URLs. It's difficult to distinguish an encrypted and nonced identifier from, say, a CSRF token, or a value that has noise added to make it differentially private. I haven't seen a scalable proposal for dealing with values like this, but I hope this CG can come up with one.

jwrosewell commented 2 years ago

@jyasskin would adding meta data to "wbraid" to explain where it was collected and for what purposes help?

I've outlined a solution to provide these features here.

jyasskin commented 2 years ago

@jwrosewell I think that's more a question for the CG as a whole than me personally.