Concerns from advertiser perspective

privacycg / private-click-measurement

Private Click Measurement

https://privacycg.github.io/private-click-measurement/

200 stars 8 forks source link

Concerns from advertiser perspective #9

Closed dazl-nz closed 4 years ago

dazl-nz commented 5 years ago

As an advertiser, this standard does not seem to support a number of requirements we need for both reporting and ongoing optimisation. Unless these can be resolved, ad-reliant publishers and ad servers are unlikely to support the standard and will continue to look for work-arounds to ITP...

Report View-through conversions - ie. people who convert after viewing (but not clicking) an ad
Support for (much) more than 64 campaigns, need the ability to report how individual creatives (of which there could be hundreds, even thousands) are performing
Support for more than 64 conversion points. Ecommerce sites often have more than this number of products, and once you include requirement to track intermediate steps in the conversion funnel, this blows out even more.
(Near) real-time reporting of conversions, particularly for ability to quickly optimise a campaign
Support the ability to re-target/exclude audiences based on conversions. eg. stop showing ads to people who have already converted, display different creatives to people who drop out of the conversion funnel
The vast majority of display ads on the web are served via a third-party ad-network (eg Google Ads / DoubleClick). Does the first-party link requirement preclude these from being tracked? If so, the standard is dead right there.

dazl-nz commented 5 years ago

This comment under Issue 7 is also relevant to my point 6 above. https://github.com/WICG/ad-click-attribution/issues/7#issuecomment-495058008

johnwilander commented 5 years ago

Thanks for filing!

As an advertiser, this standard does not seem to support a number of requirements we need for both reporting and ongoing optimisation. Unless these can be resolved, ad-reliant publishers and ad servers are unlikely to support the standard and will continue to look for work-arounds to ITP...

Just so I understand, are you a representative of an advertiser or are you filing this on behalf of advertisers but you yourself work in another capacity?

Report View-through conversions - ie. people who convert after viewing (but not clicking) an ad

We are interested in supporting ad view attribution in a privacy friendly way too. But this spec is for ad clicks and we are not ready to tackle ad views yet.

Support for (much) more than 64 campaigns, need the ability to report how individual creatives (of which there could be hundreds, even thousands) are performing

What is "much more?" Can you be more specific, please. The main reason why we are very specific in these numbers is that the capability to track users comes down to exactly that — the number of bits of entropy that is stable or controllable across sites.

Do you see a way to allow for reporting of individual creatives while still protecting users from being tracked through higher entropy numbers? We have to assume trackers will abuse to the best of their ability.

If the entropy cap on campaigns is below what you propose for reporting of individual creatives, can't advertisers instead recycle IDs and prioritize what to measure to still get actionable data?

Support for more than 64 conversion points. Ecommerce sites often have more than this number of products, and once you include requirement to track intermediate steps in the conversion funnel, this blows out even more.

Again, what is more? If the ID can encode every item in the inventory of today's huge online stores, it can easily assign a unique ID to every customer.

(Near) real-time reporting of conversions, particularly for ability to quickly optimise a campaign

Do you see a way to protect user privacy while firing reports (near) instantly? What do we do about the fact that the ad click is consumed after the first conversion, i.e. how do we support multiple conversions with a single report if the report is sent instantly?

The reason why we can't support multiple reports is that they can then be combined into a user identifier.

Support the ability to re-target/exclude audiences based on conversions. eg. stop showing ads to people who have already converted, display different creatives to people who drop out of the conversion funnel

This sounds like cross-site tracking, i.e. knowledge about the user across different websites. That is something we are against and actively prevent in Safari.

The vast majority of display ads on the web are served via a third-party ad-network (eg Google Ads / DoubleClick). Does the first-party link requirement preclude these from being tracked? If so, the standard is dead right there.

You made a cross reference of this last point and I think that issue has a good conversation going on so let's continue there.

jhaygood86 commented 5 years ago

For #5, I can see this working in a privacy safe way: an API that NewsSite.com can use to determine if a campaign/conversion event combo has already been fired and then supress that ad

dazl-nz commented 5 years ago

@johnwilander I work at a bank which spends millions every year on digital advertising, mostly via an agency which uses ad partners including Google, FlashTalking and Quantcast. My role is in ensuring we can accurately measure the performance of those campaigns.

I am not a coder, so cannot provide suggestions on technically how you could achieve these outcomes, and frankly I don't really care how you do it. All I want to do is point out what advertisers in general require - and leave it up to you as to how you achieve it. You keep giving reasons for why you cannot do things - my challenge to you is to figure out how you CAN do it.

In regards to the number of campaigns... We operate in a small market of 4 million people (New Zealand) and have only about 10 product portfolios (home loans, credit cards, bank accounts, etc). There are about 50 people in the marketing dept. However, we still have multiple campaigns per portfolio running at any time, with each campaign having multiple (10+) creatives and/or placements, targeted at different audiences.

Now imagine if we were General Motors or BMW, with multiple divisions operating in multiple geographical territories, spending $100m+ annually on digital marketing and with thousands of people working in the marketing dept. There could be tens of thousands of creative placements going at any one time, with with the resources available to track every single one to enable ongoing optimisation.

In addition, ad tech is not slowing down. For example, it is getting close to the point where you can just upload a bunch of elements (various background images, headlines, button designs etc) and the software will dynamically combine them into multiple variations and run automated tests to determine the best performing combinations. Adobe demoed something like this just last year. Just one campaign of this nature could yield thousands of creatives just by itself.

It is a similar thing for the conversion points. We have relatively few conversion points compared to a retail site. We have online application forms for probably 20 different products, but we want to track not just submitted applications, but also starting the application. So that right there is 40 conversion points. But we don't just run campaigns for sales. We also have campaigns for lots of other things, for example, encourage use of our mortgage calculator, win tickets to an event we sponsor, promote our association with a charity, or a multitude of other things. Now once again, expand that out to even a small eCommerce store. For example, my father runs a small brick & mortar retail clothing store with only one employee. His website still has over 1000 products in it. He's over 70 so doesn't really do much with it, but imagine what could happen to it in the hands of a savvy digital marketer.

johnwilander commented 5 years ago

Thanks for these details.

We are approaching this from two angles, I believe.

In the previous world (current in browsers without privacy protections), there was no practical limit on how much data you could store and collect on users. This data was also available cross-site by default.

We realize that such limitless tracking data on users underpins how much of current online ad tech works.

But we want a change. Not only in how things are done but also in what can be done technically. Limitless tracking of users is not on the table. That’s our angle.

Is your angle that advertisers (yourself and the larger ones you mention) can only accept limitless tracking of users? Or do you see a middle ground where cross-site tracking of users stops and ad measurements are done on limited data?

(Trying to circumvent a browser’s privacy protections should not be taken lightly.)

dazl-nz commented 5 years ago

We don't need tracking of users and to be clear, we as advertisers don't even do that now - all we receive are aggregated results for ad performance. (That is not to say that ad tech partners don't do this in order to enable our requirements).

We need the outcomes outlined above. If you can figure out a way to achieve that without user tracking then fantastic.

Just note that billions are spent on digital advertising every year. If the solution you develop doesn't allow for the same or better ROI than existing methods, the money will continue to flow towards technologies that do.

coopgrafik commented 5 years ago

@johnwilander

"Is your angle that advertisers (yourself and the larger ones you mention) can only accept limitless tracking of users?" You seem to be obsessed with this to the point of being somewhat myopic. With that said I do understand your mission. The amount of data ad tech companies retain is staggering. However, this guy is not an ad tech company, he's just trying to explain to you the requirements for an advertising product. He just wants to be able to track ROAS so he can make strategic decisions. If you could take a moment to put down your passion for protecting the users and just hear what the other side of the table is asking for without projecting I think you'll find it's not unreasonable.

There's is a value exchange between the advertiser who is sponsoring the free content or experience and the user who is paying with their attention and data. There is a price to everything. Content does not write itself. Users have to accept some responsibility in this arrangement. The exchange is that advertisers simply need to understand what is working and what is not working as far as their ads go. They will continue to put money into the ecosystem until they lose that ability. It's at that point that the ecosystem breaks down and the users will find less and certainly lower quality content for free and wonder why they keep running into paywalls for everything else.

No one is saying if you click on an ad it should send back your personal address and SS number and that information should be stored and resold for eternity. JCrew just needs to know what ads I saw and clicked on when I bought one of their shirts. Again, I do agree that the value exchange is weighted too much in favor of the ad tech companies but if the pendulum swings too far in the users' direction it's just as bad.

As you move forward I hope that you try and keep things equitable for both parties as that fairness will benefit everyone.

victr commented 5 years ago

It sounds quite puzzling but I cannot help but entertain the idea that you might be confusing two completely different processes - user identification and conversion attribution. As browser vendor you have a lot of impact on the identification and indeed ITP is a strong limitation, forcing strong-arm play of tricky workarounds and spread of unreliable fingerprinting. The validity of the approach aside, it's beyond doubt that you lead this game because you control the environment.

But that is not the same for the attribution. All the existing limitations are just forcing to shift the attribution process to back-end and exchange all the information via s2s connections. Consider the following scenario: as a supply provider I generate a random click ID and encode it into the click URL, so it's not even a parameter and the whole URL looks like a random string. As a demand provider I decode the received ID and communicate it to the publisher (e.g. with a postMessage), which is the first-party side and can store its own information by definition. Even if you block the local storage altogether the publisher can include the ID into all the URLs on the site. At this point it's not a click ID, it's just a piece of data controlled by the first-party. When a conversion happens the publisher notifies the demand provider, either via classic tracking pixel or, if you force things to go south, via direct s2s call, passing the original click ID. All the party involved can attribute the conversions, immediately and without any artificial restrictions. Again, the fairness and validity of the approach aside, you cannot prevent this from happening because you do not control the environment. Of course, you can make things more complicated and force more rounds of strong-arming, but it is not possible to prevent the attribution from happening.

It is true that it's still particularly hard to re-use the limited attribution information for retargeting and profiling, but the attribution itself is not an unsolvable problem for the marketers. With all the restrictions you added I can't think of a single practical use case where this tool could be useful, and to be honest my impression that you focused solely on the privacy side and sidestepped the needs of the advertisers completely. With all due respect, I believe the fight for users privacy has somewhat blindsighted your policies and you continue to contribute to crippling the fair competition in the ad tech world, instead of working together in the best interest of both the market and the end users.

michael-oneill commented 5 years ago

If the UID is stored in first party storage (a cookie or whatever), it needs consent under ePrivacy. If it is used for tracking cross-domain , i.e. processing personal data, it needs consent under GDPR (it might be claimed as under the legitimate interest basis if it was not communicated cross-domain - but that still requires a right to object). This is the reality we have to deal with.

This is a way to lawfully gather metrics needed to support online advertising (without having to obtain prior user consent)

victr commented 5 years ago

GDPR/ePrivacy only states that you need consent, as long as it's obtained the UIDs can be stored and processed. In a way Safari obstructs the right to process the data when the provider has a valid legal basis to do so. Besides many argue that transaction-level ID (such as click) cannot be linked to the user and don't have to be subjected to user consent.

michael-oneill commented 5 years ago

I agree browsers should take account of properly obtained user consent, which we attempted to standardise in the W3C TPWG with the DNT Consent API.

But that is a different issue.

This proposal is for an API that can allow for web audience measurement (which was given an exemption under the EP draft of the ePrivacy Regulation) to support advertising, where the user is not singled-out (no storage of UIDs) and only aggregated counts are collected.

johnwilander commented 5 years ago

@johnwilander

"Is your angle that advertisers (yourself and the larger ones you mention) can only accept limitless tracking of users?" You seem to be obsessed with this to the point of being somewhat myopic. With that said I do understand your mission. The amount of data ad tech companies retain is staggering. However, this guy is not an ad tech company, he's just trying to explain to you the requirements for an advertising product.

Requirements are exactly what I'm after here. We understand that capping the entropy of data from practically infinite down to 6+6 bits is a significant change. We want a significant change.

So for this discussion to be useful we need to get down to what the requirements really are. For 32 bits of entropy or above, there's no use in changing anything since 32 bits is considered enough to track all human users on the web.

If the requirements are significantly below 32 bits but above 12 bits, we'd like to understand the specifics since it's going to be a tradeoff whatever we do. (12 bits already is a tradeoff.)

He just wants to be able to track ROAS so he can make strategic decisions. If you could take a moment to put down your passion for protecting the users and just hear what the other side of the table is asking for without projecting I think you'll find it's not unreasonable.

I don't think putting down my passion for protecting customers is useful. Protecting customers' privacy is my job. I'm willing to listen nonetheless. We're having the conversation here, right?

There's is a value exchange between the advertiser who is sponsoring the free content or experience and the user who is paying with their attention and data. There is a price to everything. Content does not write itself. Users have to accept some responsibility in this arrangement. The exchange is that advertisers simply need to understand what is working and what is not working as far as their ads go. They will continue to put money into the ecosystem until they lose that ability. It's at that point that the ecosystem breaks down and the users will find less and certainly lower quality content for free and wonder why they keep running into paywalls for everything else.

As it stands, legacy ad click attribution no longer works in Safari (with Intelligent Tracking Prevention turned on) since cookies are no longer sent. This means that if we can't agree on anything here, zero bits are sent. Sure, trackers will continue the arms race, but we will too.

We think a better way forward is possible. That way is going to have to strike a balance between user agents that are committed to protect their users' privacy and advertisers who want to measure the effectiveness of their ad campaigns.

No one is saying if you click on an ad it should send back your personal address and SS number and that information should be stored and resold for eternity.

That is not the tracking we are referring to. Any unique, personal identifier that is used across websites is cross-site tracking in our view. The identifier does not have to have a meaning in the physical or offline world.

JCrew just needs to know what ads I saw and clicked on when I bought one of their shirts.

We disagree. While an advertiser may want to know that you saw an ad and then bought a shirt, we argue that they don't need to know that to measure the effectiveness of their ad campaigns. We are removing the individual from the measurement and are saying the advertiser only needs to know that someone who clicked an ad bought a shirt.

Again, I do agree that the value exchange is weighted too much in favor of the ad tech companies but if the pendulum swings too far in the users' direction it's just as bad.

As you move forward I hope that you try and keep things equitable for both parties as that fairness will benefit everyone.

The fact that we've not only proposed this standard but also implemented it in WebKit shows our commitment to both parties.

johnwilander commented 5 years ago

It sounds quite puzzling but I cannot help but entertain the idea that you might be confusing two completely different processes - user identification and conversion attribution. As browser vendor you have a lot of impact on the identification and indeed ITP is a strong limitation, forcing strong-arm play of tricky workarounds and spread of unreliable fingerprinting. The validity of the approach aside, it's beyond doubt that you lead this game because you control the environment.

But that is not the same for the attribution. All the existing limitations are just forcing to shift the attribution process to back-end and exchange all the information via s2s connections. Consider the following scenario: as a supply provider I generate a random click ID and encode it into the click URL, so it's not even a parameter and the whole URL looks like a random string. As a demand provider I decode the received ID and communicate it to the publisher (e.g. with a postMessage), which is the first-party side and can store its own information by definition. Even if you block the local storage altogether the publisher can include the ID into all the URLs on the site. At this point it's not a click ID, it's just a piece of data controlled by the first-party. When a conversion happens the publisher notifies the demand provider, either via classic tracking pixel or, if you force things to go south, via direct s2s call, passing the original click ID. All the party involved can attribute the conversions, immediately and without any artificial restrictions. Again, the fairness and validity of the approach aside, you cannot prevent this from happening because you do not control the environment. Of course, you can make things more complicated and force more rounds of strong-arming, but it is not possible to prevent the attribution from happening.

It is true that it's still particularly hard to re-use the limited attribution information for retargeting and profiling, but the attribution itself is not an unsolvable problem for the marketers. With all the restrictions you added I can't think of a single practical use case where this tool could be useful, and to be honest my impression that you focused solely on the privacy side and sidestepped the needs of the advertisers completely. With all due respect, I believe the fight for users privacy has somewhat blindsighted your policies and you continue to contribute to crippling the fair competition in the ad tech world, instead of working together in the best interest of both the market and the end users.

Before we dive further into this, are you up-to-date with how ITP 2.1 and 2.2 works? We are actively fighting click IDs in navigations.

victr commented 5 years ago

@johnwilander yes, I understand the scope of 2.1 and 2.2, and I'm saying it's not going to prevent me from a) generating ad url like example.com/aahd871tdgda, and b) adding aahd871tdgda to all the urls on the page (in case you block the local storage altogether)

johnwilander commented 5 years ago

@johnwilander yes, I understand the scope of 2.1 and 2.2, and I'm saying it's not going to prevent me from a) generating ad url like example.com/aahd871tdgda, and b) adding aahd871tdgda to all the urls on the page (in case you block the local storage altogether)

I expect bad actors to continue their efforts to track users cross-site. Those actors should expect us to continue our effort to protect users against cross-site tracking.

We have to take abuse into account when we design web platform features. If we were to design an ad click attribution feature that does not restrict data transfer to significantly lower than 32 bits, we would have to assume that it would be repurposed to achieve cross-site tracking which would be counter to years of effort to prevent cross-site tracking.

The draft spec we present here is a different approach. It does not allow for cross-site tracking of individual users.

At this point, I think we've taken the "advertiser perspective" into account. Are there any concrete proposals in terms of the draft spec?

(I don’t mean snarky air quotes, I mean the advertiser perspective presented above which is probably a subset of the broader advertiser perspective.)

victr commented 5 years ago

@johnwilander I've been thinking on it and discussing with colleagues for quite some time, and if I may share my professional opinion I'd summarize it in the following points.

I do not believe that amendments to the draft spec you suggested can make any significant difference because to the best of my knowledge it's fundamentally different from the needs of advertisers. I can't think of a minor technical change that can make it fit for the use cases I know.
I think it's fair to say that you've made your case for preventing cross-site tracking and many marketers have to accept it. If we were to talk about a different approach I'd say it's your turn to make a step towards the marketers and accept unrestricted (in terms of campaign performance measurement) attribution. These two concepts do not contradict each other and can happily coexist. I believe it's feasible because all the user data can be excluded from the process, just as you said - "removing individual from the measurement". For example, instead of reducing the "entropy", as you stressed several times, you could go with "increasing" the entropy, making it not usable for user identification because there are too many permutations instead of too few. I can't offer you a ready-to-go solution for obvious reasons, but I'll try to share some thoughts in the following post.
I also believe not calling legitimate companies "bad actors" could be quite an appreciated step if you are willing to cooperate. Not agreeing with your point of view doesn't make advertisers scammers. Huge amount of companies operate in legal field, making more than reasonable amount of efforts to respect users rights. Nevertheless they have to diverge significant budgets to counteract your actions, instead of spending them to fuel the Internet businesses and content creators. Being in a state of war is your mindset you chose to follow, not theirs. I can't help but question that it's in the best interest of the end users.
In ideal world I'd rather see a user making this type of decisions. Many people (me including) have no issues with valid sites tracking their activities, especially if they are able to execute the right to object. Uncontrollable tracking prevention works for some users but it's not a panacea. Personally I'm equally not comfortable with either uncontrollable tracking or unconditionally delegating all the decisions to the browser. That being said I realize this is the strategy you chose and not willing to discuss.

victr commented 5 years ago

So to continue with the idea of higher "entropy", let's assume there are two core principles - the user identity cannot be shared between sites and the event attribution for marketing purpose should go unrestricted. The latter implies that we'd need a random token unique enough for each single "event", such as ad impression. Since it's an obvious way to mask user ID the first principle implies that a) the randomness should be controlled by the vendor, and b) the amount of arbitrary data passed along should be strictly limited, similar to what you did with those 6 bit fields. One way to do that would be a service where ad techs can register end-points required to serve the ads and track conversions, while the browser vendor can enforce some policies to fulfill the user-centric approach and in exchange providing the data required for normal ad delivery, such as event token or user-agent string.

For practical example let's assume that we have a pulisher.com who's using ssp.com and the actual ad is delivered by dsp.com on behalf of advertiser.com. Publisher obviously has a user ID but not really interested in sharing this identity with other parties. SSP registers a serving end-point like https://ssp.com/deliver?sourceID={int}&transactionID={transactionID}&page={URL}&domain={domain}&ua={useragent}&timestamp={timestamp}&foo={bool} During the registration the browser vendor applies restricting policies, e.g. all the {int} parameters combined cannot exceed 6 bit capacity per site (not overall!). But when publisher executes the invocation code and the said URL is actually called the browser replaces the macros with actual values, such as page url, domain, user agent (probably reduced), timestamp (reduced to minutes), etc. These data are not generated by any party and just a general information about the environment, thus can be safely used for ad delivery. The publisher has no way to pass the user ID to SSP or further, and the SSP cannot persist user identity because it has no access to local storage/cookies. The transaction ID is assigned to the session and is kept alive until the session ends.

Then the dsp delivers an ad and is allowed to re-use the same transaction ID, so the future events are correctly attributed in both systems. Just like SSP it has no access to local storage and cannot persist the user identity. DSP registers tracking end-points with similar parameter patterns and conversion type (or name), such as "click", "conversion", "purchase", etc. The overall amount of the tracking types can be limited for the session (again, not globally!). The DSP is also registering the destination URL - advertiser.com/landingpage, also with a similar pattern of parameters but with one important exception - the advertiser is not allowed to know transaction ID, so it's excluded from the URL and instead saved internally in the browser. The conversion type is also not passed and must be pre-shared between DSP and Advertiser.

The user can navigate on advertiser.com as long as they need and when a conversion happens the advertiser calls an API method, something like attributionService.fire('click') or attributionService.fire('purchase'). The browser fire all the tracking pixels associated with the required type and the transaction ID of the current session. The advertiser can easily persist its own user ID, but it cannot share the identity with dsp or ssp, and doesn't even know the parties that were involved. A limited information such as sourceID or original domain should be enough for the advertiser to measure the performance of different traffic sources.

It is important that the variability of the data is limited to the session or site, not globally as in the 6 bit fields as you suggested. This way marketers can run millions of campaigns without sharing user identities. And the next logical step would be to finally give the control to users, so they could give local storage usage permissions for trusted players. Whenever the conditions for this process are not kept you can just fallback to regular ITP-powered procedures.

Of course it's a very general sketch, with many flows, and it doesn't address things like multiple ads, fingerpinting for information sharing (but frankly neither does ITP), asynchronous code execution, etc. I just wanted to show that it's possible to think about it from a different angle. Let me know if you'd like to further discuss this idea.

johnwilander commented 5 years ago

So to continue with the idea of higher "entropy", let's assume there are two core principles - the user identity cannot be shared between sites and the event attribution for marketing purpose should go unrestricted. The latter implies that we'd need a random token unique enough for each single "event", such as ad impression. Since it's an obvious way to mask user ID the first principle implies that a) the randomness should be controlled by the vendor, and b) the amount of arbitrary data passed along should be strictly limited, similar to what you did with those 6 bit fields. One way to do that would be a service where ad techs can register end-points required to serve the ads and track conversions, while the browser vendor can enforce some policies to fulfill the user-centric approach and in exchange providing the data required for normal ad delivery, such as event token or user-agent string.

For practical example let's assume that we have a pulisher.com who's using ssp.com and the actual ad is delivered by dsp.com on behalf of advertiser.com. Publisher obviously has a user ID but not really interested in sharing this identity with other parties. SSP registers a serving end-point like https://ssp.com/deliver?sourceID={int}&transactionID={transactionID}&page={URL}&domain={domain}&ua={useragent}&timestamp={timestamp}&foo={bool} During the registration the browser vendor applies restricting policies, e.g. all the {int} parameters combined cannot exceed 6 bit capacity per site (not overall!). But when publisher executes the invocation code and the said URL is actually called the browser replaces the macros with actual values, such as page url, domain, user agent (probably reduced), timestamp (reduced to minutes), etc. These data are not generated by any party and just a general information about the environment, thus can be safely used for ad delivery. The publisher has no way to pass the user ID to SSP or further, and the SSP cannot persist user identity because it has no access to local storage/cookies. The transaction ID is assigned to the session and is kept alive until the session ends.

Then the dsp delivers an ad and is allowed to re-use the same transaction ID, so the future events are correctly attributed in both systems. Just like SSP it has no access to local storage and cannot persist the user identity. DSP registers tracking end-points with similar parameter patterns and conversion type (or name), such as "click", "conversion", "purchase", etc. The overall amount of the tracking types can be limited for the session (again, not globally!). The DSP is also registering the destination URL - advertiser.com/landingpage, also with a similar pattern of parameters but with one important exception - the advertiser is not allowed to know transaction ID, so it's excluded from the URL and instead saved internally in the browser. The conversion type is also not passed and must be pre-shared between DSP and Advertiser.

The user can navigate on advertiser.com as long as they need and when a conversion happens the advertiser calls an API method, something like attributionService.fire('click') or attributionService.fire('purchase'). The browser fire all the tracking pixels associated with the required type and the transaction ID of the current session. The advertiser can easily persist its own user ID, but it cannot share the identity with dsp or ssp, and doesn't even know the parties that were involved. A limited information such as sourceID or original domain should be enough for the advertiser to measure the performance of different traffic sources.

It is important that the variability of the data is limited to the session or site, not globally as in the 6 bit fields as you suggested. This way marketers can run millions of campaigns without sharing user identities. And the next logical step would be to finally give the control to users, so they could give local storage usage permissions for trusted players. Whenever the conditions for this process are not kept you can just fallback to regular ITP-powered procedures.

Of course it's a very general sketch, with many flows, and it doesn't address things like multiple ads, fingerpinting for information sharing (but frankly neither does ITP), asynchronous code execution, etc. I just wanted to show that it's possible to think about it from a different angle. Let me know if you'd like to further discuss this idea.

Unfortunately, I'm not able to understand this sketch to an extent where I can judge wether or not it preserves the user's privacy.

Can you rewrite it in a way where we get full examples (all data fields filled in) and bullet points for which party controls each of those data fields? If the browser generates a piece of data, such as a timestamp, please include if any party can control how or when the browser generates that data and what granularity the data has.

When we analyze these things, we have to assume abuse. Any data under any website's control can constitute a user identity or be connected to a user identity backend. We can not rely on promises or goodwill to protect user privacy, as proven by years and years of abuse of web standards for cross-site tracking purposes.

michael-oneill commented 5 years ago

Without user agreement no personal data can be collected online in the EU, and increasingly in other jurisdictions - meaning it must not be collected at all, not just "cannot be shared between sites". Any unique persistent identifier is classed as personal data because it can or does identify an individual person.

This API allows conversion events to be counted by remote servers without communicating unique user identifiers, so that the basic statistical information required for the business of online advertising can continue to be collected without requiring user consent.

victr commented 5 years ago

@johnwilander Ok, I was rewriting the scenario I proposed and then I realized that the key to solving this problem is the access to local-storage/cookies. Basically the whole ITP is about stripping the rights to access the storage when certain conditions are met, but the advertisers don't really need cookies in the first place. The cookie-less environment which provides random transaction IDs is all you need for the attribution. Consider this upgraded version.

Step 1. Publisher.com creates a friendly iframe with a special attribute to inform the browser that it should be handled differently and loads SSP framework into it <iframe allowattribution="true">.

Step 2. Any script which is executed inside this frame cannot access local storage or use cookies. Instead it can access window.transactionID which is a random token generated by the browser at the moment of creating iframe, something like 3RAmJKyIdcdmfOZg9TpUTl9a9BP4gJHqwyneVdwCh4r8RiwSgnHSHznl3mKXcENDX. SSP and DSP register the tracking end-points associated with the current ad and the type of the conversion, e.g. window.trackingService.register('purchase', 'https://dsp.com/tracking?foo=bar&transaction={transactionID}');. These requests are also cookie-less so I don't see a point in restricting the data in the parameters.

Step 3. User clicks on a creative and get redirected to advertiser.com/landing.page. The transaction ID from the original frame is now assigned to the whole browser tab (or session), so that the user can navigate between different pages on advertiser.com. When a conversion happens advertiser.com makes a simple call to the API window.trackingService.fire('purchase') and the previously registered trackers are fired with {transactionID} macro rendered to the transaction ID of the current tab.

Of course this model can be used for fingerprinting and hidden tracking, it is not intended to prevent it. My point is that it does not create any new loophole in the ITP wall that does not already exist. Whatever is possible in this scenario is already possible. This is a very simple and straightforward way to share the attribution data between all the parties involved, while you can continue the fight against sharing user identities. Prove me wrong :)

johnwilander commented 4 years ago

Thank you everyone for sharing your thoughts! This issue touches on a large number of concerns. Please file individual concerns for consideration.