privacycg / private-click-measurement

Private Click Measurement
https://privacycg.github.io/private-click-measurement/
200 stars 8 forks source link

Why attribution reports cannot go to third-parties and to anything else than the registrable domain #57

Open johnwilander opened 3 years ago

johnwilander commented 3 years ago

Two requests were brought up at a recent Privacy CG call and I said I'd write up the privacy analysis of why we think attribution reports cannot go to third-parties and to anything else than the registrable domain (eTLD+1).

Why Not Attribution Reports To Third Parties?

Some have requested that the click source site should be able to assign a reporting URL/domain other than its own. Others have requested that a third-party such as the host of an iframe where the click happens should be the one receiving the report.

Neither of these meet our privacy requirements. In both cases, the domains can be chosen to convey further information about the click.

Imagine for instance social.example where the ad click happens saying they want reports to go to johnwilander-social.example when I'm logged in there and to janedoe-social.example when Jane Doe is logged in. That would take us back to cross-site tracking in the subsequent report.

Similarly, ad links can be made to be served in iframes from johnwilander-social.example or janedoe-social.example to achieve the same level cross-site tracking.

Even Worse With Custom eTLDs

This issue becomes worse with tracking companies owning their own eTLDs under which it's virtually free for them to register new domains. They could simply put a unique event ID in the domain, such as 487f90aa469c6234.customTLD and be back to web scale event-level cross-site tracking.

Why Not Attribution Reports To Subdomains?

Some have requested that attribution reports be sent to the full domain of the site where the click happens and similarly the full domain of the site where the conversion happens.

Neither of these meet our privacy requirements. In both cases, subdomains can be chosen to convey further information about the click or conversion.

Imagine for instance social.example where the ad click happens making sure the site is loaded from the subdomain johnwilander.social.example when I'm logged in there and from the subdomain janedoe.social.example when Jane Doe is logged in. That would take us back to cross-site tracking in the subsequent report.

The reason for restricting PCM reports to registrable domains is that the scheme+registrable domain, a.k.a. schemeful site, is the only part of a URL that is free from link decoration. All other parts can be made user specific, including subdomains.

You could of course imagine social.example setting up a registrable domain per user, such as johnwilander-social.example, and load the whole website from that domain when I'm logged in to get back to cross-site tracking of clicks. If that happens, we'd have to deal with it but at least the user has a chance to see that a personalized domain is used through the URL bar.

jbpringuey commented 3 years ago

I would love to hear what could be a privacy problem is in what I am proposing. if attributionreporting gets the value https://www.tracking.adtech.com/ and then https://www.tracking2.adtech.com/ we would invalidate all data for https://www.tracking.adtech.com/ as there can only be one FQDN registered for *.adtech.com in an instance.

johnwilander commented 3 years ago

I would love to hear what could be a privacy problem is in what I am proposing. if attributionreporting gets the value https://www.tracking.adtech.com/ and then https://www.tracking2.adtech.com/ we would invalidate all data for https://www.tracking.adtech.com/ as there can only be one FQDN registered for *.adtech.com in an instance.

How would you guarantee that there’s only one registration? I’ve probably missed that part.

How would you defend against a party who owns a whole TLD, like .adtech? They can register as many domains as they want for free. This is mentioned above.

jbpringuey commented 3 years ago

I see, thanks John. Could we consider that the ad-tech player defines the tracking domain in a new file /.well-known/tracking-url . For example, https://www.adtech.com/.well-known/tracking-url would return https://www.tracking.adtech.com/ . Anything that is not the same as this value on any browser instance would be dismissed. It could also be cached on the browser. Would that work ?

jbpringuey commented 3 years ago

.well-known/tracking-url should not work on the subdomain. For example https://www.tracking.adtech.com/.well-known/tracking-url or https://www.tracking2.adtech.com/.well-known/tracking-url would check https://www.adtech.com/.well-known/tracking-url . The approach would be similar to what is already done in the industry to fight fraud with ads.txt ( https://iabtechlab.com/ads-txt/ ) in which publisher register valid vendors with their id and tokens. ( https://www.nytimes.com/ads.txt , https://www.bbc.com/ads.txt etc... )

Myrtle commented 3 years ago

Hi! As we can see, with the incoming changes related to Private Relay and ITP evolutions regarding the IP address proxy, Apple seems to be keen to be an even stronger middle-man. Not sure this solution has been already put on the table but do you think it is realistic to rely on a token system managed by Apple itself? Let's say I'm fancy-adtech , I could register using my company name and my domain, like fancy-adtech.com , if this registration is valid, a token is provided. This token could be provided when displaying an ad, alongside attributionsourceid and attributionsdestination, instead of explicitly declaring a reporting URL for example. The browser could be able to translate this token to a reporting URL by itself using the domain attached to the token and the naming convention already in place, for our example it would result in fancy-adtech.com/.well-known/private-click-measurement/report-attribution/ By doing so, we can control how a party behaves. Typically we could ensure that one company doesn't register for hundreds or thousands of tokens, ensuring one company won't rely on domains to trick the system. It definitely adds friction since validation by Apple is needed at some point, but maybe that's acceptable.

jbpringuey commented 3 years ago

Is there any updated plan to support this ? As it is, the restriction will give a big advantage to google and Facebook for the ads they are serving from their sites and will probably impact significantly revenue of small and medium sized publishers.

vincentsaluzzo commented 3 years ago

Hi 👋 Any update on that topic? I can see the label agenda+ has been recently removed. Is solutions proposed by @Myrtle or by @jbpringuey couldn't fit all the requirements about user privacy and let third parties monitoring attribution effectively?

jbpringuey commented 2 years ago

Now that in iOS 15.1 trackers traffic identified by the ITP algorithm are using private relay fully, I am not sure how a third party can forge the identity of anyone in safari. I think that private relay is a major innovation from Apple 👏 👏 in respecting the privacy of internet users and I am a big fan of it. However, the current API that is only allowing advertiser and publisher domain to receive the attribution and as such I am curious to this the adoption of the API as is. I think that it is necessary for a third party provider to be able to receive the attributed conversions. Small publisher do not have the IT to implement this as is and will usually use a SAAS software. The big players like google and facebook are finding ways to work around it with modeled conversions https://support.google.com/google-ads/answer/10081327?hl=en#zippy=%2Cimpact-of-ios https://www.facebook.com/business/help/311705270326952 .

I understand that it will require funding for other browser but for Safari I do not see how it is possible unless a user explicitly wants to be identified for trackers. ping @johnwilander