privacycg / proposals

New proposals in the Privacy Community Group
https://privacycg.github.io
122 stars 5 forks source link

Bounce Tracking Protection #6

Open johnwilander opened 4 years ago

johnwilander commented 4 years ago

In the spirit of a community group, we’d like to share some of our Intelligent Tracking Prevention (ITP) research and see if cooperation can get us all to better tracking prevention for a problem we call bounce tracking.

Safari’s Old Cookie Policy

The original Safari default cookie policy, circa 2003, was this: Cookies may not be set in a third-party context unless the domain already has a cookie set in a first-party context. This effectively meant you had to “seed” your cookie jar as first party.

Bounce Tracking

When working on what became ITP, our research found that trackers were bypassing the third-party cookie policy through a pattern we call "bounce tracking" or "redirect tracking." Here's how it works:

  1. The content publisher's page embeds a third-party script from tracker.example.
  2. The third-party script tries to read third-party cookies for tracker.example.
  3. If it can't, it redirects the top level to tracker.example using window.location or by hijacking all links on the page.
  4. tracker.example is now first party and sets a cookie—it seeds its cookie jar.
  5. tracker.example redirects back to the original page URL or to the intended link destination.
  6. The tracker.example cookie can now be read back in third-party contexts.

Modern tracking prevention features generally block both reading and writing cookies in third-party contexts for domains believed to be trackers. However, it's easy to modify bounce tracking to circumvent such tracking prevention. Step 5 simply needs to pass the cookie value in a URL parameter, and step 6 can stash it in first-party storage on the landing page.

Bounce tracking is also hard to defend against since at the time of the request, the browser doesn’t know if it’ll be redirected.

Safari’s Current Defense Against Bounce Tracking

ITP defends against bounce tracking by periodically purging storage for classified domains that the user doesn’t interact with. Doing navigational redirection is one of the conditions that can get a domain classified by ITP so being a “pure bounce tracker” that never shows up in a third-party context does not suffice to avoid classification. The remaining issue is potential bounce tracking by sites that do not get their storage purged, for instance due to the fact that the user is logged in to the site and uses it.

Can Privacy CG Find a Comprehensive Defense?

We believe other browsers with tracking prevention have no defense against bounce tracking (please correct if this is inaccurate) and it seems likely that bounce tracking is in active use. Because we've described bounce tracking publicly before, we don't consider the details in this issue to be a new privacy vulnerability disclosure. But we'd like the Privacy CG to define some kind of defense.

Here are a few ideas to get us started:

pes10k commented 4 years ago

Some context, we did some measurement of this ~1 yr ago.

https://brave.com/redirection-based-tracking/

I'd be very interested in other numbers folks might be have, especially as it might help us understand how this risk compares against other risks on the platform. (not that we shouldn't trace down every leak in the platform, but interested in the highest-marginal-benefit ranking)

jackfrankland commented 4 years ago

To highlight a similar mechanism for completeness, sorry if it's documented elsewhere and not considered bounce tracking:

  1. The content publisher's page embeds a third-party script from tracker.example.
  2. The third-party script tries to read third-party cookies for tracker.example.
  3. If it can't, it injects a tracker.example iframe on the publisher's page.
  4. User clicks on content in the iframe (intentionally or via click-jacking).
  5. Using window.open, a new tab/window is opened for tracker.example.
  6. tracker.example window is now first party and can read or write cookies.
  7. tracker.example window accesses a function on tracker.example iframe, via window.opener, to pass an identifier.
  8. tracker.example window closes itself, and was only open for a short amount of time.
  9. Identifier can be passed to initial third-party script via postMessage and stored in first-party storage for continued tracking on the site.
johnwilander commented 4 years ago

To highlight a similar mechanism for completeness, sorry if it's documented elsewhere and not considered bounce tracking:

  1. The content publisher's page embeds a third-party script from tracker.example.
  2. The third-party script tries to read third-party cookies for tracker.example.
  3. If it can't, it injects a tracker.example iframe on the publisher's page.
  4. User clicks on content in the iframe (intentionally or via click-jacking).
  5. Using window.open, a new tab/window is opened for tracker.example.
  6. tracker.example window is now first party and can read or write cookies.
  7. tracker.example window accesses a function on tracker.example iframe, via window.opener, to pass an identifier.
  8. tracker.example window closes itself, and was only open for a short amount of time.
  9. Identifier can be passed to initial third-party script via postMessage and stored in first-party storage for continued tracking on the site.

Interesting. Have you seen this in the wild for tracking purposes? I wouldn't call it bounce tracking, rather insta-popup tracking or brief popup tracking. Maybe file as individual issue? I'd love to hash it out.

othermaciej commented 4 years ago

We should definitely defend against this kind of pop-up based tracking if it's being exploited. (Unfortunately, it sounds rather similar to pop-up based OAuth flows.)

othermaciej commented 4 years ago

Some context, we did some measurement of this ~1 yr ago.

https://brave.com/redirection-based-tracking/

I'd be very interested in other numbers folks might be have, especially as it might help us understand how this risk compares against other risks on the platform.

This is cool data!

I had a hard time figuring out how prevalent bounce tracking is from that post, or in particular the subset we are calling bounce tracking. Bounce tracking is a subset of first-party redirect tracking (which also includes things like URL shorteners, or sites that send all outgoing links through a redirect) which in turn is a subset of redirect tracking (which can include redirects in third-party context). But I couldn't even figure out how prevalent redirect tracking in general is. Help appreciated!

As mentioned by John, we know that at least one significant tracking firm was providing bounce tracking for Safari users across a number of sites, before ITP rendered it ineffective. We don't know current prevalence though.

pes10k commented 4 years ago

@othermaciej thanks! We used the measurements to estimate how often users would hit bounce trackers, using a bunch of random walks of the web, weighted by initial website popularity. Not perfect of course, but a useful first cut.

Our initial plan was to use the above to identify bounce tracking domains (e.g. domains that have storage read or written to, but were only intermediate in some bounce). E.g.

  1. click link on popular site
  2. arbitrary number of other eTLD+1s the user is 301'ed or otherwise, where something storage related happens
  3. finally land on another eTLD+1 thats distinct from all the above.

the "2"s in the above are what got counted in the research behind the blog post, and what we started building lists of. The short term plan was to just block storage on these domains, even in 1p context, until there was a user gesture.

We had some more ideas too that were promising, but have (so far) been triaged down, since we didn't think this was where the best "privacy bang for the buck was", but the partial list includes:

(all "white boarding stage stuff", but where our mind was at at the time)

empijei commented 4 years ago

Thanks for publishing this proposal!

I would like to point out that adding on-device based models could potentially harm privacy and security of users so I think that the crawlers-based approaches that you suggest should be preferred.

johnwilander commented 4 years ago

Thanks for publishing this proposal!

I would like to point out that adding on-device based models could potentially harm privacy and security of users so I think that the crawlers-based approaches that you suggest should be preferred.

Hi! We are well aware of that research. However, it's the observability of global state that's problematic, not the global state itself. And when looking at the observability you have to weigh it against just letting a tracking vector exist. Often you'll find that removing the tracking vector does more for privacy than avoiding the observability of global state. Ideally, you have neither.

For browsers without comprehensive tracking prevention on by default, cookies, web storage, and HTTP cache are readily available global state that's observable cross-site. I.e. that's where you have to start if observability of global state is something you want to defend against.

kushal commented 4 years ago

The remaining issue is potential bounce tracking by sites that do not get their storage purged, for instance due to the fact that the user is logged in to the site and uses it.

Distinguishing legitimate federated sign-on scenarios and legitimate analytics and affiliate scenarios from permissionless bounce tracking seems quite hard. As it is, the "user interaction" signal currently in ITP seems likely to have both false positives and false negatives with the consequence of making it harder for users to stay logged in to authentication providers they care about.

As @othermaciej points out

We should definitely defend against this kind of pop-up based tracking if it's being exploited. (Unfortunately, it sounds rather similar to pop-up based OAuth flows.)

empijei commented 4 years ago

However, it's the observability of global state that's problematic, not the global state itself.

Observing the side effects that you mentioned in the proposal would be trivial, that's why I proposed to go for the shared state across all users. Since we are trying to address the problem let's try to tackle it once and for good. I feel like moving the tracking capability to a different vector would be less beneficial than removing it completely.

(I think that if there is a local model that changes navigation behavior it will probably always be observable)

IMO tracking users based on a list shared across millions of them would likely be more complicated.

For browsers without comprehensive tracking prevention on by default, cookies, web storage, and HTTP cache are readily available global state that's observable cross-site. I.e. that's where you have to start if observability of global state is something you want to defend against.

I agree something has to be done but not at the risk of harming security. This is not a situation in which the worst case is an ineffective mitigation like it would be for a poorly configured CSP, this is a case similar to the XSS auditor, which introduced XSLeaks in almost every website in existence for a debatable advantage over XSS.

Overall, I think the crawler solution would be more solid, I'm not against this proposal, jut here vouching for one of the options you put out 😸

othermaciej commented 4 years ago

Crawler-based classification has holes too, there may be trackers not detectable from the network position of the crawlers but that are detectable for a given user. (Due to geospecific redirects or even trackers identifying the IP block of the crawlers).

Going back to classification In this case, let's consider the proposal with single bit of "classified as a bounce tracker" that puts a site into SameSite=Strict jail. This is pretty hard to abuse. It's detectable only during an actual attempt at bounce tracking, and combining the bits into a usable unique ID requires bouncing through many (32ish) distinct domains every time serially, which is likely to be a prohibitive performance cost. And that bouncing will itself cause all of them to be identified as bounce trackers, so IDs of this form will self-destruct after only a few uses at most. Capping the length of redirect chains is also likely to be web compatible, at a level lower than what would be needed to pull this off.

Let me know if there's a mistake in this analysis.

That said, a combo of crawling and client-side detection may be the right balance.

hober commented 4 years ago

@snyderp, would it be fair to conclude from your comments on this issue that Brave would like to see this proposal taken up by the CG?

pes10k commented 4 years ago

I'm not sure I understand what the specific proposal is, but I'm very in favor of PrivacyCG working on this problem! :)

johnwilander commented 4 years ago

I'm not sure I understand what the specific proposal is, but I'm very in favor of PrivacyCG working on this problem! :)

We wanted to share this at an earlier stage than a crisp proposal to see if we can have this community group work its way through the issue and land in a shared solution. So the proposal is to take on the work of defining the vulnerability and enumerating defenses we believe work, possibly settling on a single one.

The outcome could be standards language which conveys that user agents may put restrictions A, B, and C in place to defend against attack X.

pes10k commented 4 years ago

Ah, then if the proposal is "lets put our brains together and figure out a good, standards-focused, cross browser solution to this problem" then brave is 100% on board

(didn't mean to be a pedant, just didn't know if the proposal was substantive or procedural)

othermaciej commented 4 years ago

This may need to be written up as an explainer about the problem and the solution space before we start drafting in spec-level detail

hober commented 4 years ago

Would you like time on the next call to talk about this proposal?

dennisvdheijden commented 4 years ago

I'd hope it's not a broad measurement like periodic purge of cookies for websites without user interaction tools like A/B testing for improvements of user interface (of one organization) already have a hard time with ITP and I hope solutions can be focused on the offenders and less or "everyone"

johnwilander commented 4 years ago

Would you like time on the next call to talk about this proposal?

If you're asking me, then yes, I can talk a bit about it on the next call.

AramZS commented 4 years ago

I agree this does sound remarkably similar to OAuth flows, which I think we would want to keep generally, though I can see an argument that we might consider removing this behavior is a reasonable push towards proposals like the Trust Token API, though that's a long-term plan. Is the intent here to whitelist specific known-to-be-for-oauth domains? How are people who are attempting currently to block this behavior handling OAuth?

TanviHacks commented 4 years ago

Are folks interested in having an ad-hoc meeting to discuss this? cc @johnwilander @jackfrankland @pes10k @englehardt @othermaciej @AramZS

If so, please file an issue here.

pes10k commented 4 years ago

I would be interested in attending a call on it

TanviHacks commented 4 years ago

@johnwilander are you up for leading this discussion in an ad-hoc meeting?

johnwilander commented 4 years ago

Sure, I can lead the discussion. Would love help on scribing though.

TanviHacks commented 4 years ago

Thanks @johnwilander! We'll start working on scheduling this.

TanviHacks commented 4 years ago

Ah, actually I should take scheduling over to a new issue. Filed https://github.com/privacycg/meetings/issues/5.

TanviHacks commented 4 years ago

Reminder - ad-hoc meeting on this topic Thursday, April 23rd 10am PDT per https://github.com/privacycg/meetings/issues/5#issuecomment-612254081

hariombalhara commented 4 years ago

It can impact Redirect AB Tests as well. The case would be two domains owned by a company where they are AB testing a complete design overhaul with a different domain e.g example.com -> newuiexample.com

If a visitor is chosen to have a redirect of this particular test, then after 1 month of inactivity, cookies would be purged from example.com and then he might not see the redirects.

hober commented 4 years ago

@johnwilander, could you summarize how the call went & let the folks on this issue know what your next steps will be? Here are the minutes from the call.

hober commented 4 years ago

@johnwilander, could you summarize how the call went & let the folks on this issue know what your next steps will be? Here are the minutes from the call.

John?

johnwilander commented 4 years ago

Thanks for the reminder!

My summary of the virtual f2f call:

Next steps:

dveditz commented 3 years ago

it seems that sites voluntarily adding a same-site redirect to get access to their cookies in a bounce tracking scenario was never considered in the threat model.

The primary goal of SameSite cookies is to stop CSRF attacks. Same-site redirects (even "same page" redirects) to regain cookie visibility when using Strict cookies was a recommendation we discussed that explicitly safe landing pages could use so they had workable cookies while leaving the vast majority of the site protected at the Strict level. (The other main approach would be to use separate Lax identity cookies and a Strict auth token.)

SameSite cookies are not a tracking protection mechanism so you're right that we did not consider Bounce tracking. To aid usability in the threat model we did consider we explicitly support same-site redirects.

samuelgoto commented 3 years ago

We should definitely defend against this kind of pop-up based tracking if it's being exploited. (Unfortunately, it sounds rather similar to pop-up based OAuth flows.)

Yep, we ran into this too. We've been calling this the classification problem (we are not very good with names :)): it is hard for browsers to differentiate / distinguish between OAuth-flows and non-OAuth-flows because OAuth /OpenID has been built on top of low level primitives.

https://github.com/samuelgoto/WebID#the-classification-problem

On first thought, partitioning the API space between auth and non-auth seemed sufficient, but solving the classification problem with high-level APIs is also insufficient if their implementation are not sufficiently tied to Sign-in (for example, with IDP-controlled UIs below) because an attacker/abuser may use it for other non-OAuth purposes, and in doing so takes you back to your original problem of bounce tracking.

The best guess so far is that, to address the classification problem, one has to (a) partition the API space (e.g. high-level / intent-specific APIs) but also (b) make the implementation of the API meaningless to use cases outside of the one intended (e.g. implement high-level / intent-specific UIs, for example here) so that a bounce tracker can't use it.

This implies that the browser takes a much larger role in signing-in, mediating most of the exchange (which has its consequences), but seems at first sight to sufficiently address the classification problem.

So, to go back to your original point, I think WebID can, possibly, help bounce tracking by addressing the classification problem (i.e. allowing a browser to distinguish oauth vs non-oauth use cases) and in doing so allowing it to use different policies. Seems insufficient, but perhaps a constructive/meaningful step forward.

johnwilander commented 3 years ago

There is now a bounce tracking proposal called SWAN (bounce tracking as in trying to track, not preventing it). Details here: https://github.com/SWAN-community/swan/blob/main/data-flows.md

We should revisit the bounce tracking protection topic. Ping @pes10k.

pes10k commented 3 years ago

Sounds good, we're just about to announce something related to this too (mostly list based, for v1, with some more interesting follow ups expected shortly after), so revisiting sounds great

jwrosewell commented 3 years ago

@johnwilander @pes10k Thanks for adding SWAN to the agenda for the F2F.

As a member of the SWAN.community I can explain the approach, including how economics, law and engineering disciplines have come to together to produce a solution which gives people meaningful control and choice over privacy.

jwrosewell commented 3 years ago

For those that would like to find out more about SWAN.community's approach to privacy ahead of the May F2F we have drop in sessions throughout April and early May with some of the lawyers that worked on it.

See https://event.webinarjam.com/register/10/plqm1hw

johnwilander commented 3 years ago

@johnwilander @pes10k Thanks for adding SWAN to the agenda for the F2F.

This agenda item is not to discuss SWAN which is not (yet) a work item or proposal in this community group. This agenda item is to discuss protection against bounce tracking.

jwrosewell commented 3 years ago

@johnwilander Understood. If the group would like to know more about the approach to privacy and choice advocated by SWAN.community then I'd be happy to explain more and thereby inform the discussion on bounce tracking.

In relation to the F2F agenda on bounce tracking I'm interested to learn about the harms and the protection required.

TheMaskMaker commented 3 years ago

It is very clear that this cannot be discussed without also discussing SWAN, since it seems aimed to kill SWAN.

jwrosewell commented 3 years ago

SWAN establishes a legal basis for data processing among controllers, has a mechanisms for audit, is aligned to solving the problems of this group, and creates a level playing field concerning competition irrespective of organization size or the other services that an entity operates, I certainly hope no one is working against these objectives or goals at the W3C. I would be extremely concerned if they were.

SWAN.community would welcome the opportunity at next week's face to face to explain SWAN to this group and demonstrate how the work complies with laws and regulators stated requirements. That is a matter for the chairs of this group who control the agenda. The chairs could co-ordinate with me to ensure representatives from SWAN.community are available when bounce tracking protection is discussed to ensure these stakeholders views are represented in the discussion.

michael-oneill commented 3 years ago

Only courts can establish a legal basis for data protection.

Technology could help controllers acquire subjects' consent, which could be a claimed legal basis if it meets the strict validity requirements. In some circumstances a claim for the public or legitimate interest basis could be supported by technology enabling subjects' right-to-object.

My understanding of SWAN is that it uses redirection to bypass third-party cookie blocking. The claim is that the resultant data processing is covered by the legitimate interest basis, with the data protection role of browsers replaced by legal contracts between the third-party entities.

This is well beyond the capacity of this group to decide, our focus should only be the technology.

As @johnwilander has said, bounce tracking emerged several years ago and was clearly desiged to avoid Safari's original protections against third-party tracking. A New Method Bypassing Safari's Third-Party Cookies Blocking:

There are also multiple ways that third-party script can record first-party state, and enable its correlation by third-party servers to build a cross-domain tracking vector, as pointed out above by @jackfrankland and others.

To mitigate this probably requires more control over first-party state..

One feature of the SWAN proposal that could be helpful in this is the ability to communicate first-party acquired user consent state to third-parties, removing the need to bombard users with repeated consent panels or storage access prompts. To avoid the tracking risks this state has to be restricted to low entropy values, first-party located and only browser triggered, but with the ability to communicate it site-specifically to embedees via a request header.

Maybe we could discuss this in the F2F.

TheMaskMaker commented 3 years ago

Only courts can establish a legal basis for data protection.

If this group only focused on technology, then it sounds like you are saying we should drop safari's proposals to enable these "safeguards" as they merely force a legal issue we have no right to decide. But I am confused because you then say that swan and not safari is in the wrong for bringing up a legal issue? That seems contradictory to me.

Why were Safari's original 'safeguards' deemed necessary? Wasn't that determination out of scope and inappropriate?

I think it is improper form to make the claim that SWAN is bypassing Safari, or Safari is Bypassing swan. Both are true, they disagree about what privacy means, and I agree that we should 100% discuss this at the FtF.

johnwilander commented 3 years ago

@TheMaskMaker, would you be willing to share who you are and your affiliation? It's good to understand who's represented here and whose viewpoints are being shared.

TheMaskMaker commented 3 years ago

@johnwilander Of course, in fact if I haven't already managed it, it would help me greatly if you could advise me how to link my W3 account to this working group; I thought I already did but the interface has been giving me no peace! I picked the name 'mask maker' back in my younger days on account of the hobby of making costume masks, and now I wish I had just made a different account.

On a separate note, I have a bit of an awkward next comment; in order to explain a concern I have I need to mention Apple/Safari in relation to a big privacy threat and I want to be clear it does not reflect on my opinions of you at all, as I can tell you are working hard like the rest of us to improve privacy.

If anything maybe you can give me more insight there.

TheMaskMaker commented 3 years ago

@michael-oneill Let me give you an example of one of my concerns over this bounce-tracking prevention proposal:

The United States government's Federal Trade Commission's webcast on 'dark patterns' (deceptive tactics) in the web called out Apple's Safari's ability to track users through integrations with ios that do not include the use of cookies or bouncing to my knowledge as a threat. If we really do intend to wipe out all user tracking, Apple may wish to make a declaration that it will or has ceased all AppleId, tracking profiling, adsales of user data, and customization through user profiling. And this should be just as enforced and auditable by the community as anything else we do. I do not believe that has happened.

If Safari still plans to use apple ids through ios-safari integrations then this bounce tracking does nothing to prevent user tracking, it just monopolizes for the browser and hides it from the user and web publishers. SWAN, if adopted by Apple/Safari, would expose that data, and allow the user to see it, opt out of it, even delete it.

Bounce-tracking prevention, if adopted by safari, would not. It would only prevent SWAN. Which would prevent this level of user control.

I don't see SWAN as trying to bypass privacy, I see that as trying to enforce it.

TheMaskMaker commented 3 years ago

@johnwilander I also want to be clear that I am not at this time supporting swan over your proposal either. I like some aspects of it.

In fact my ideal case is if you and James could work together.

I think the transparency and control aspects of SWAN would be a huge boon for users and keep the market open, and combined with browser safeguards, and agreements from higher powers like Apple/Google, we could achieve a privacy solution better than either of you are likely to come up with or be able to implement if the browser and adtech are at odds and both trying to cheat whatever systems we come up with.

johnwilander commented 3 years ago

This issue is about bounce tracking protection and collaboration in this community group to achieve bounce tracking protection. It's not about other kinds of tracking or tracking protections. Please refrain from discussing other things than bounce tracking protection here since doing so makes it harder to stay focused on what this proposal is about. You can file your own issues for things you'd like to discuss or propose as work items. Thanks!

darobin commented 3 years ago

There is only one mention of Apple in the transcript of the FTC's dark pattern and it is in support of Apple's removal of the IDFA. There are public statements from Safari engineers clearly indicating that history in Safari is synced E2EE and therefore off limits to Apple — as it should be.

It is overwhelmingly clear that users expect their browsers to protect their data: 89% want their browser to prevent their data from being shared (source: Eurobarometer). Safari is acting in direct support of its users and the work the Safari team is doing plays a direct role in convincing users to choose Safari and Apple products. There are many ways in which these user expectations can be violated and the protections implemented to support them can be deliberately circumvented. Bounce tracking is one. Closing that loophole is a natural step forward.

The Times definitely supports moving this forward. Avoiding this kind of circumvention aligns with our readers' expectations, creates a more trustworthy Web, is better for publishers' businesses, and opens the door to a more competitive ad market with fewer network effects in the valuation of data.

TheMaskMaker commented 3 years ago

There is only one mention of Apple in the transcript of the FTC's dark pattern and it is in support of Apple's removal of the IDFA.

[EDIT] I went back and read it to check what I remember watching, and while you are right about that particular mention being initially positive, (the words "potentially inherently manipulative" are later to used to describe the consent mechanisms themselves, though this is I believe a more general comment) they do describe it is an opt-in system that still enables user tracking. This proposal would still prevent tracking for competitors with or without consent that Apple can do with consent.

Also the pattern is described as dark most certainly as a negative in reference to Google and similar login systems. I'm glad Apple is heading in the right direction, but the point is the tracking is still there. Thus the competition concern exists. [END EDIT]

89% want their browser to prevent their data from being shared

I have read otherwise. https://iabeurope.eu/all-news/iab-europe-news/latest-research-shows-eu-citizens-understand-and-appreciate-the-ad-supported-internet/

And regardless you are talking about a proposal that gives users more control not less. They can choose to not share it. I don't understand your objection