privacycg / nav-tracking-mitigations

Navigation-based Tracking Mitigations
https://privacycg.github.io/nav-tracking-mitigations/
31 stars 14 forks source link

Are unsubscribe links with a user ID link decoration and/or tracking? #5

Open jyasskin opened 2 years ago

jyasskin commented 2 years ago

https://github.com/privacycg/nav-tracking-mitigations/pull/2/files#r704161877 asks how we should think of links like publisher.example/unsubscribe?userId=5789rhkdsaf8urfnsd, where the user ID is used to identify the account to unsubscribe, and may or may not be required. (If it's not required, the user would be able to fill out a text input field on the target site to identify the account.)

martinthomson commented 2 years ago

Does the URL include information that identifies something other than what is necessary to load the resource successfully?

In this case, as a capability URL, it might be the case that the URL includes information that might not load a resource, but is used to authorize access to that resource.

The information does convey information to the site about the user, but it only does so in order to identify a resource that is specific to that user. The URL would not function if it did not include user identification.

Taking a step back, if you are looking for a measure by which you might apply policy, then this approach of looking at "valid" examples will ultimately fail. Say you allow this on the terms above. Then, sites that wishes to engage in navigational tracking could exploit that. Site B can customize all its content, providing user-specific pages, not for its own users, but for each separate user of site A. Site A then takes their user identifiers and puts them in all outbounds links to site B. This might meet the definition above, but it would still be tracking.

jyasskin commented 2 years ago

Yep, https://github.com/privacycg/nav-tracking-mitigations/pull/2/files#r704859591 also mentions that we can't let "necessary to load the resource successfully" mean it's not navigational tracking because that's too abusable.

BrianLefler commented 2 years ago

One observation about the unsubscribe case is that presumably this link was decorated with publisher.example's own pre-existing id for that user, not some other site's. Likely for use in marketing emails sent by publisher.example.

I do not think that should be defined as navigational tracking, because it was not part of a protocol to connect identities. That's true even if a user did later view that email on webmail.example and click on the link.

johannhof commented 2 years ago

It was, but that's not transparent to the web browser as it's coming from webmail.example. If we want to avoid difficult discussions about legal concepts like ownership we should try to limit ourselves to client-observable behavior as much as possible.

Also, I don't agree that this isn't used to connect identities just because it comes from the same party. If the user did not previously log into publisher.example they might not expect the site to be able to identify them (on a separate device, or in private browsing for example). The key differentiator still seems to be whether the identity is needed to complete a user-initiated task.

This kind of auto-login through embedding authentication tokens in URLs is, I think, not uncommon for links in emails and figuring out whether this violates user expectations and our privacy model seems interesting. There's a big risk here when users are sharing URLs with embedded personalized identifiers, without being able to assess the sensitivity of the data they are sharing.

BrianLefler commented 2 years ago

Agree it's important to focus on client-observable behavior, I think this case cautions implementers about the limits of client-side analysis. There will be false positives from assuming every navigation from A => B with a user-unique identifier is a case of A sending their identifier to B. Here, the actual identity exchange happened in the past when publisher.example received the user's email address (i.e., webmail.example's id for that user). I believe /unsubscribe?userId=1234 and /unsubscribe?email=foo@webmail.example are identical from a privacy perspective, except that the second case is transparent to the user before clicking.

It seems like links with the referrer's id encoded are very different than links with the destination's id encoded. Email click tracking is a form of tracking, but it's not for the purpose of joining identities and so is not navigational tracking by the definition here.

martinthomson commented 2 years ago

So email confirmations (that is, the process by which sites have you prove that you own an email address) almost completely fall under this definition of navigational tracking. That the information is, at a conceptual level, originating from the site (via the user choosing to give the site their email address), is not something that the browser is able to see. The entire goal of those interactions is to convey a single bit of information from the mail provider to the site. If we were to look at that mechanically, that's navigational tracking.

That doesn't depend on the mail provider being served as webmail.

If we go further and view a webmail site as an adversary, they are going to look very bad. In general, however, mail providers do not deliberately share information about their users with other sites. They could, and we'd have a hard time detecting that, but they don't. My mail provider at least tries very hard to do the exact opposite of that. Avoiding things like tracking bugs and whatnot is - increasingly - industry-standard practice for email, albeit with varying success as it requires certain difficult trade-offs. But when someone clicks on a link in email, it's pretty much game over for navigational tracking.