privacycg / nav-tracking-mitigations

Navigation-based Tracking Mitigations
https://privacycg.github.io/nav-tracking-mitigations/
35 stars 16 forks source link

General Navigational Tracking mitigation, lifted from FedCM #82

Open bvandersloot-mozilla opened 2 months ago

bvandersloot-mozilla commented 2 months ago

I haven't seen this technique discussed here, but it is being relied upon in FedCM to provide protections against navigational tracking. I think it would be worth thinking about its efficacy and/or general utility.

Essentially, FedCM uses a finite set of links inside of a site-level (subdomains stripped to the site) .well-known resource to reduce the entropy in cross-site requests to prevent tracking: https://www.w3.org/TR/2024/WD-fedcm-1-20240820/#idp-api-well-known. This is relied upon to prevent requests from containing user identifiers from the current context outward because the request must contain uniquely identifying information to the requested origin. This is similar to navigation, where the navigation becomes a new first-party context.

In theory, inward navigation by an IDP could be restricted to a finite set of urls that is specified at the site level by a .well-known file.

This leaves two interesting questions IMO:

  1. does such a technique offer meaningful protection against navigational tracking?
  2. would such a restriction ever be feasible?
wanderview commented 2 months ago

Maybe we can talk about this at TPAC. I'm not fully following your line of thinking from the description so far.

martinthomson commented 2 months ago

If we get time during our session after the agenda runs down, this on the list of AOB. We'll try to maintain agenda discipline, but we also might choose to spend the time on other active work items.

johannhof commented 2 months ago

Super interesting question, I think it may depend on how much work and suffering you want to put into this (and make everyone else work and suffer, too).

To recap, the problem that FedCM had to solve (AFAIK) was that it needed some way for IdPs to register API endpoints and it needed proof that those are not personalized with a user id. This works by requiring them to be accessible publicly and fetching them without credentials, as opposed to e.g. the IdP registering them in their own top-level context when 1P cookies are available (another way to solve this is the "proof of knowledge" approach where RP and IdP are required to submit the same endpoints).

This is cool, but it really only works for endpoints that do not receive any kind of information via navigational parameters, or endpoints that do receive parameters but can be requested out of bounds by the user agent without including information about the RP.

Unfortunately, a lot of the web works by passing unique identifiers directly from one site to another. So while you might be able to reduce your set size to "all users of high entropy parameters, both benign and tracking", you still have this very hard problem of distinguishing between those. But maybe that can be punted to the user ™️!

So, if I were to make an idea out of this (and I'm not saying it's a good one or one that I even propose pursuing):

Feels like a herculean effort worthy of the scale of this problem :)

bvandersloot-mozilla commented 2 months ago

@martinthomson: I wouldn't expect agenda time for a random issue filed the week before TPAC. AOB is generous, thank you! :)

@johannhof: I think those three bullets are probably the closest to a path of deployment for something like this. Considering the lack of options on the table for defense against general navigational tracking, I thought writing one down would be useful.

wanderview commented 1 month ago

I realize now we did not end up discussing this at TPAC. Sorry I forgot to bring it up.

Is the idea here we would not enforce BTM on URLs in a well-known list? Or are you suggesting some other enforcement approach if a URL is not in this well-known list?

bvandersloot-mozilla commented 1 month ago

Or are you suggesting some other enforcement approach if a URL is not in this well-known list?

This one. Although I'm not particularly advocating for it. Just saying that since navigational tracking is trying to be mitigated in FedCM, we should see what lessons can be learned since we have a lack of answers for how to stop it.

wanderview commented 1 month ago

I think FedCM has the advantage of being designed for a specific set of use cases for a theoretically bounded set of IdPs. It may be reasonable to constrain these use cases in a way that does not make sense for redirects in general. Trying to apply this approach in the general case seems quite constraining to "redirects are a useful primitive on the web".

I feel, though, like maybe I need to see a more specific proposal to really understand here.