privacycg / private-click-measurement

Private Click Measurement
https://privacycg.github.io/private-click-measurement/
196 stars 8 forks source link

Supporting multiple conversions for one click, a.k.a. reporting windows #95

Open johnwilander opened 2 years ago

johnwilander commented 2 years ago

The Attribution Reporting for Click-Through Measurement proposal has the concept of reporting windows.

The spec authors propose that up to three attribution reports can be sent for a single source. My interpretation is that that means for a single source click.

They also provide an example of reporting windows a browser could use:


I'd like to explore the idea of reporting windows for PCM where the privacy restrictions are quite different from Attribution Reporting API. PCM does not support user IDs on either side.

Currently, PCM's triggering event is allowed a 4-bit value which goes into the subsequent attribution report. If we were to support two or three reporting windows, the opportunity for cherry-picking of which users to even fire a triggering event for increases and with it the risk of linking a report to a specific user. The risk of being able to link attribution reports from multiple windows also increases. I therefore think the triggering event value needs to be much smaller for reporting windows beyond the first.

Here's a straw man to get us started:

The reason why I bring up such a long window as day 8 to 35 is to open up the conversation on return on ad spend (ROAS). A month's worth of measurement if valuable when the advertising is not about a single purchase but about the longer term value of the acquired customer.

The delay in reporting must scale with the length of the reporting window. Otherwise a bad actor can cherry-pick users to trigger events for in distinct subwindows and know exactly for whom the report is sent when it arrives later. With scale I don't necessarily mean 1:1 scale but for an 8 to 35-day window, the delay would have to be 24 to 168 hours (1 to 7 days) or something like that to deter cherry-picking.

This relationship between length of reporting window and necessary length of delay means diminishing returns when considering even longer report windows.

Let me know what you think, both of the usefulness for advertisers and the privacy implications of such a scheme.

(Also, @csharrison and @johnivdel, please check that I've understood your spec right.)

dialtone commented 2 years ago

From a buyer perspective this doesn't change much in the incentive structure. You can see Facebook and Snapchat announcements on their earnings where due to lack of measurement ability spend is moving away from their platforms and into others. This doesn't seem to change much compared to where we are at today regarding measurement on PCM. The greater context to allow longer windows is also to remove the incentive of going after last click measurement and optimization which is the kind that searches and incentivizes the use of personal data and linkages, and one in which contextual-type campaigns typically don't perform as well.

I understand the goal of avoiding entropy for tracking purposes, but a 1-7 day delay on conversion already means that a buyer would need to wait a full 7 days while purchasing traffic before they know anything meaningful about how it performed. On top of it, any change made to such purchase parameters would need to wait at least a further 7 days before anything can be said about its performance. And to further on top, the buyer only gets access to a fairly coarse 1 bit or 1.5 bits after 2 days, that seems pretty rough for whoever is optimizing on the other side, virtually impossible to recollect what drove the purchase of ads on the other side which means that changes to said campaign really need to move one by one at the slowest pace possible making all of this optimization (manual or not) impractical.

I don't think I fully understand your attack vector about cheery-picking windows and users, but I'd like to understand better what is the data leakage there, mostly because on one side you are setting quite severe limitations and thresholds but on the other side I've not seen an impact assessment to go with it.

There are my $0.02 and it's possible my lack of understanding of your explained threat clouds my judgement here a bit.

johnwilander commented 2 years ago

Sorry for super long delay here. I was expecting that we'd talk about this on a Privacy CG call but they kept getting catcalled. Now it's up for tomorrow at least.

From a buyer perspective this doesn't change much in the incentive structure. You can see Facebook and Snapchat announcements on their earnings where due to lack of measurement ability spend is moving away from their platforms and into others. This doesn't seem to change much compared to where we are at today regarding measurement on PCM. The greater context to allow longer windows is also to remove the incentive of going after last click measurement and optimization which is the kind that searches and incentivizes the use of personal data and linkages, and one in which contextual-type campaigns typically don't perform as well.

I understand the goal of avoiding entropy for tracking purposes, but a 1-7 day delay on conversion already means that a buyer would need to wait a full 7 days while purchasing traffic before they know anything meaningful about how it performed. On top of it, any change made to such purchase parameters would need to wait at least a further 7 days before anything can be said about its performance. And to further on top, the buyer only gets access to a fairly coarse 1 bit or 1.5 bits after 2 days, that seems pretty rough for whoever is optimizing on the other side, virtually impossible to recollect what drove the purchase of ads on the other side which means that changes to said campaign really need to move one by one at the slowest pace possible making all of this optimization (manual or not) impractical.

I don't understand what you're saying here. What I outlined was up to three attribution reports per measured click, one for day 0-2 after the click, one for day 3-7 after the click, and one for day 8-35 after the click.

The proposed delay of 1 to 7 days before the attribution report would go out would only apply to triggering events in the day 8-35 window. That means that the advertiser a) has potentially already received two attribution reports for this click, and b) has already waited 8-35 days after the click. The 8-35 day window is not about short measurement cycles but about longer term measurement of return on ad spend. Some acquired users will only prove valuable after some time, for instance after a 30-day try-before-you-buy period. Those are the kind of measurements that would have the 1-7 day delay on their attribution reports to deter trying to track individual users by cherry-picking who to call the API for.

I don't think I fully understand your attack vector about cheery-picking windows and users, but I'd like to understand better what is the data leakage there, mostly because on one side you are setting quite severe limitations and thresholds but on the other side I've not seen an impact assessment to go with it.

Imagine we would not increase the time delay for a time window like day 8-35 but keep the 24-48 hour delay we have today. In such a 8-35 day window, the destination website may have learned a lot about the user. That gives a bad actor the opportunity to only trigger conversion for a small set of users, for instance only the ones who've purchased the gold package, only the ones who've reached level 20 in the game, or only the ones who've linked their brokerage account. By doing such cherry-picking and doing it on specific days, the bad actor could know that a specific attribution report is connected to a specific user.

Example: "I only triggered a conversion for Peter, John, and Amanda on December 17 so any reports on December 16-18 will be for them and I'll learn who of those three I acquired through advertising, from which publisher sites, and for which ad campaign."

A key part of the opportunity to do so is the long window of day 8-35 in which the bad actor can trigger the conversion.

By increasing the delay, we significantly limit the opportunity to leverage the long conversion time window for such cherry-picking.

There are my $0.02 and it's possible my lack of understanding of your explained threat clouds my judgement here a bit.

I hope I managed to clarify. As always, thanks for commenting!

dialtone commented 2 years ago

Isn't the example you provide with Peter, John and Amanda inevitable in any case? The window of conversion doesn't matter if you only trigger it for a carefully selected subset of users. You can just extend that December 16-18 to 16-22 and your reasoning would be unchanged.

It also doesn't seem to be a particular escalation to go from having actual sensitive information, like your brokerage account, to knowing that you have it now because you clicked an ad 8-35 days ago.

On the other hand, the case for all products that aren't impulse purchases, that you will finalize within 7 days of clicking an ad, is in worse shape than all of those that are impulse purchases because of the additional delay introduced in each subsequent reporting window.

Anyway, I suppose we can chat on the call today :). cheers and thanks for clarifying.

dialtone commented 2 years ago

I think I understand the misunderstanding I had on this. I rest my case on the delay, but the decrease of the bits available seems excessive, already there aren't that many available to do much with them.