Unlinkable tokens on the click destination side and the triggering event API

privacycg / private-click-measurement

Private Click Measurement

https://privacycg.github.io/private-click-measurement/

196 stars 8 forks source link

Unlinkable tokens on the click destination side and the triggering event API #88

Open johnwilander opened 2 years ago

johnwilander commented 2 years ago

Supporting unlinkable tokens on the click destination side poses some additional challenges based on these facts:

There is potentially a many-to-one relationship between click sources and the click destination when the triggering event happens, whereas in the case of the click event, there's just a one-to-one relationship.
The same-site pixel "API" and its corresponding JavaScript API are intended to support a triggering event that matches multiple click sources at once, possibly as a wildcard option to trigger for all click sources.
Any site-observable behavior tied to the triggering event that is conditional on prior incoming clicks will allow the click destination to learn things about the current user.

To maintain privacy, these things must hold:

Unlinkable tokens should only be tied to one report. The report goes to both source and destination but it's the same report. If the same token was used for multiple reports, they can be linked together.
Requesting the signing of a token must not signal any specific PCM browser state, i.e. must not reveal whether or not the user has a pending click or a pending click from a specific source.

This means:

Token signing for triggering events must happen unconditionally, i.e. the browser must always ask for a token to be signed regardless of any pending clicks.
A wildcard triggering event must always request a fixed number of signed tokens, for instance one. Otherwise the browser will leak to the destination site from how many source sites it has matching clicks.

To be able to move forward and not add lots of complexity upfront, I propose that:

All triggering events result in a single token signing request on the destination site.
The same-site pixel "API" and its corresponding JavaScript API only support triggering events for a single click source or for all click sources (a.k.a. wildcard).
If the triggering event is for a single click source, the single token will be tied to that source.
If the triggering event is for all click sources (wildcard), the token will be tied to the latest click source for this destination and attribution reports for any remaining click sources will not carry destination side tokens. This automatically adds an optional last click attribution signal to PCM (optional in that the click destination doesn't have to sign a token).

At a later stage, we might consider requesting multiple tokens to be signed for triggering events that may match several pending clicks.

johnwilander commented 2 years ago

Pinging @csharrison since Google may want to consider this analysis for the Attribution Reporting API. Pinging @benjaminsavage and @eriktaubeneck since they've commented a lot on fraud prevention in the past.

johnwilander commented 2 years ago

(Erik, you may not be a Privacy CG member yet. GitHub at least doesn't list you when I try to @-mention you.)

eriktaubeneck commented 2 years ago

Thanks for pinging me @johnwilander, I'll take a look.

(I'm a member of the privacy-cg on the W3 but I'm not exactly sure how to join on GitHub. If you can point me in the right direction, I'll sign up to join!)

bmayd commented 2 years ago

Per my comment in the Sept 9 call: clicks on ads are very unusual and multiple clicks are extremely rare. Given that, if a small number of tokens (say 3) was issued per request, in the majority of cases only one would be used, but in the very rare cases of multiple clicks an additional one or two could be used.

eriktaubeneck commented 2 years ago

Following up on my comments on the 9/9 call.

This means:

Token signing for triggering events must happen unconditionally, i.e. the browser must always ask for a token to be signed regardless of any pending clicks.

A wildcard triggering event must always request a fixed number of signed tokens, for instance one. Otherwise the browser will leak to the destination site from how many source sites it has matching clicks.

Agreed on these points.

If the triggering event is for all click sources (wildcard), the token will be tied to the latest click source for this destination and attribution reports for any remaining click sources will not carry destination side tokens. This automatically adds an optional last click attribution signal to PCM (optional in that the click destination doesn't have to sign a token).

I’m concerned that this could advantage the last click, in that other clicks would now be vulnerable to fraud. I think there are two other options which should be considered:

If the triggering event is for all slick sources (wildcare), the destination specifies how many clicks it would like considered, and signs that many tokens. This seems like a natural extension of the fact that the tokens themselves are options, and the click destination doesn't have to sign any tokens. As @johnwilander correctly points out, the browser cannot ask for some number of tokens, as that would leak the number of attributable events.
The browser (and ideally the standard) could specify some maximum number of source events which could possibly be attributed to a trigger event, and the destination site could always simply issue that many tokens. As @bmayd points out, this is probably reasonable for clicks; however ideally the PCM and other attribution proposals eventually coalesce, and the Chrome proposal currently includes view through impressions. With that inclusion, a global standard seems less ideal.

johnwilander commented 2 years ago

Following up on my comments on the 9/9 call.

This means:

Token signing for triggering events must happen unconditionally, i.e. the browser must always ask for a token to be signed regardless of any pending clicks.

A wildcard triggering event must always request a fixed number of signed tokens, for instance one. Otherwise the browser will leak to the destination site from how many source sites it has matching clicks.

Agreed on these points.

If the triggering event is for all click sources (wildcard), the token will be tied to the latest click source for this destination and attribution reports for any remaining click sources will not carry destination side tokens. This automatically adds an optional last click attribution signal to PCM (optional in that the click destination doesn't have to sign a token).

I’m concerned that this could advantage the last click, in that other clicks would now be vulnerable to fraud. I think there are two other options which should be considered:

If the triggering event is for all slick sources (wildcare), the destination specifies how many clicks it would like considered, and signs that many tokens. This seems like a natural extension of the fact that the tokens themselves are options, and the click destination doesn't have to sign any tokens. As @johnwilander correctly points out, the browser cannot ask for some number of tokens, as that would leak the number of attributable events.

The browser (and ideally the standard) could specify some maximum number of source events which could possibly be attributed to a trigger event, and the destination site could always simply issue that many tokens. As @bmayd points out, this is probably reasonable for clicks; however ideally the PCM and other attribution proposals eventually coalesce, and the Chrome proposal currently includes view through impressions. With that inclusion, a global standard seems less ideal.

The privacy aspects of PCM vs Attribution Reporting API will likely never get to a joint solution since the 64-bit identifier on the click source side of Attribution Reporting API is in direct opposition of the privacy goals of PCM.

Apart from that, I hope we can get as close as possible to each other, which is why I'm leaning toward last-click attribution for PCM since Google seems to favor the advertiser or ad network choosing an attribution model (first click, last click) rather than sending multiple attribution reports. Note that I mean only the last click gets attribution, not just that only the last click gets an unlinkable token from the click destination. Attribution Reporting API only sends a attribution report for one matching click source and discards the rest.

They have a priority function on both the click source side and a the click destination side. That's one possible path. Another is to have the click destination side choose attribution model at the triggering event. Which would be preferable?

benjaminsavage commented 2 years ago

Apart from that, I hope we can get as close as possible to each other, which is why I'm leaning toward last-click attribution for PCM since Google seems to favor the advertiser or ad network choosing an attribution model (first click, last click) rather than sending multiple attribution reports. Note that I mean only the last click gets attribution, not just that only the last click gets an unlinkable token from the click destination. Attribution Reporting API only sends a attribution report for one matching click source and discards the rest.

I don't think this is an accurate description of the Attribution Reporting API. Check out this issue I filed on this topic to get clarity: https://github.com/WICG/conversion-measurement-api/issues/68

I specifically asked about the case in which there are two clicks for a single conversion and got confirmation that both will receive an attribution report. It seems the Googler's have proposed a cap of 3. This seems reasonable to me.

Note that I mean only the last click gets attribution, not just that only the last click gets an unlinkable token from the click destination

This is particularly concerning and would be a huge problem. Google search ads would be unfairly advantaged in this approach, as it is very common for people to go to Google and search for the name of something as a navigational mechanism, even when they are searching in the first place due to seeing an ad elsewhere. This doesn't mean that click from a Google ad was causal. The reason they were searching for that specific item in the first place was due to some other event.

This is why we remain so focused on conversion lift testing, it's the only truly unbiased way to measure ads effectiveness. But at the very least, let's not make PCM only send attribution reports to the "last click". I'd be much happier with an approach where at most 3 receive reports.

johnwilander commented 2 years ago

Apart from that, I hope we can get as close as possible to each other, which is why I'm leaning toward last-click attribution for PCM since Google seems to favor the advertiser or ad network choosing an attribution model (first click, last click) rather than sending multiple attribution reports. Note that I mean only the last click gets attribution, not just that only the last click gets an unlinkable token from the click destination. Attribution Reporting API only sends a attribution report for one matching click source and discards the rest.

I don't think this is an accurate description of the Attribution Reporting API. Check out this issue I filed on this topic to get clarity: WICG/conversion-measurement-api#68

I specifically asked about the case in which there are two clicks for a single conversion and got confirmation that both will receive an attribution report. It seems the Googler's have proposed a cap of 3. This seems reasonable to me.

The conversation in that issue doesn't agree with the description in Google's spec/explainer, unless they mean multiple sources from the same site which sounds weird:

Multiple sources for the same trigger (Multi-touch)

If multiple sources were clicked and associated with a single attribution trigger, send reports for the one with the highest priority. If no priority is specified, the browser performs last-touch.

There are many possible alternatives to this, like providing a choice of rules-based attribution models. However, it isn’t clear the benefits outweigh the additional complexity. Additionally, models other than last-click potentially leak more cross-site information if sources are clicked across different sites.

<snip>

Note that I mean only the last click gets attribution, not just that only the last click gets an unlinkable token from the click destination

This is particularly concerning and would be a huge problem. Google search ads would be unfairly advantaged in this approach, as it is very common for people to go to Google and search for the name of something as a navigational mechanism, even when they are searching in the first place due to seeing an ad elsewhere. This doesn't mean that click from a Google ad was causal. The reason they were searching for that specific item in the first place was due to some other event.

This is why we remain so focused on conversion lift testing, it's the only truly unbiased way to measure ads effectiveness. But at the very least, let's not make PCM only send attribution reports to the "last click". I'd be much happier with an approach where at most 3 receive reports.

If we can get agreement on a certain cap, that sounds good. What say you, @csharrison?

eriktaubeneck commented 2 years ago

I think the primary misunderstanding here is due to this portion of the Attribution Reporting for Click-Through Measurement:

Trigger attribution algorithm

When the browser receives a attribution trigger redirect on a URL matching the attributiondestination eTLD+1, it looks up all sources in storage that match <attributionreportto, attributiondestination> and picks the one with the greatest attributionsourcepriority. If multiple sources have the greatest attributionsourcepriority, the browser picks the one that was stored most recently.

My understanding of this is that you could have two different sources, and so long as they use different reporting origins (attributionreportto), the destination site could then issue a trigger event for both site, both would attribute, and both would result in delayed and noised reports.

@johnwilander I'm actually not sure what would happen in this case with PCM, i.e. two source clicks exist to the same destination domain, and the destination registers two different trigger events (one for each source domain). Would PCM attribute both independently and issue delayed reports for both?

johnwilander commented 2 years ago

@johnwilander I'm actually not sure what would happen in this case with PCM, i.e. two source clicks exist to the same destination domain, and the destination registers two different trigger events (one for each source domain). Would PCM attribute both independently and issue delayed reports for both?

Yes, PCM as implemented and specified today matches each triggering event individually and schedules reports individually too. The problem arises when a single triggering event can trigger for multiple click sources.

benjaminsavage commented 2 years ago

What happens today with PCM? My understanding was that a single trigger event could be attributed to multiple click sources and generate multiple, independent attribution reports.

johnwilander commented 2 years ago

What happens today with PCM? My understanding was that a single trigger event could be attributed to multiple click sources and generate multiple, independent attribution reports.

Nope. There doesn't exist a way today to trigger for more than one source at a time. If news.example makes the triggering redirect, only a pending click from news.example can convert, not from social.example. That's what the wildcard trigger would add.

benjaminsavage commented 2 years ago

OK, but today, if the destination site had implemented multiple "pixels" for all of the channels where they buy ads, which produced multiple "triggering redirects", this could result in a single "Purchase Event" producing multiple, independent attribution reports to multiple click sources (assuming there were clicks from multiple click sources) right?

eriktaubeneck commented 2 years ago

@johnwilander I'm actually not sure what would happen in this case with PCM, i.e. two source clicks exist to the same destination domain, and the destination registers two different trigger events (one for each source domain). Would PCM attribute both independently and issue delayed reports for both?

Yes, PCM as implemented and specified today matches each triggering event individually and schedules reports individually too. The problem arises when a single triggering event can trigger for multiple click sources.

Thanks @johnwilander, that makes sense to me. And if I'm reading correctly, that's aligned with what @benjaminsavage is suggesting.

I don't think that Google has considered a wildcard trigger, primarily because they require a reporting domain (unlike PCM), and that the reports are scoped to that reporting domain (which seems like it would be incompatible with a wildcard.) However, it would be great if we could actually talk about this on the Conversion Measurement API. It looks like @csharrison is out of the office for the next meeting (convenient for me as I will be as well), but the meeting after that is Oct 4th. If this were on the agenda, would you be able to attend @johnwilander?

Ultimately, I have concerns (and I know @benjaminsavage does as well) at anything that would codify last touch attribution into the standard. Ideally, we would enable sources and destinations to utilize the attribution model which best solves their measurement use case (within the privacy constraints of each API.)

abebis commented 2 years ago

The conversation in that issue doesn't agree with the description in Google's spec/explainer, unless they mean multiple sources from the same site which sounds weird:

They removed support for multi-touch/credit in https://github.com/WICG/conversion-measurement-api/issues/172 Some background here: https://github.com/WICG/conversion-measurement-api/issues/177

johnwilander commented 2 years ago

The conversation in that issue doesn't agree with the description in Google's spec/explainer, unless they mean multiple sources from the same site which sounds weird:

They removed support for multi-touch/credit in WICG/conversion-measurement-api#172 Some background here: WICG/conversion-measurement-api#177

Thanks, Antoine! Not sure what that means though. Do they still support sending attribution reports to multiple sources for one triggering event or not?

johnivdel commented 2 years ago

Thanks, Antoine! Not sure what that means though. Do they still support sending attribution reports to multiple sources for one triggering event or not?

Attribution Reporting has a 1:1 relationship with sources and triggers (see |sourceToAttribute| at https://wicg.github.io/conversion-measurement-api/#triggering-attribution) using a priority mechanism and defaulting to last click.

My understanding is that the existing PCM design only stores the most recent unattributed source for a given (source site, destination site) e.g. last-click scoped within a single source site. The only difference being discussed here is having a same-site pixel/wildcard API do last click in a (destination site) scoping instead.

For reference Attribution Reporting uses (reporting origin, destination site) instead of (source site, destination site) for scoping.

As expressed on https://github.com/privacycg/private-click-measurement/issues/31#issuecomment-574785707, there are some concerns around the same-site pixel/JS wildcard API with regards to Attribution Reporting. But I think aligning on whether many to one relationships are ever allowed makes sense.

Ultimately, I have concerns (and I know @benjaminsavage does as well) at anything that would codify last touch attribution into the standard. Ideally, we would enable sources and destinations to utilize the attribution model which best solves their measurement use case (within the privacy constraints of each API.)

@eriktaubeneck Could you clarify which notions of last touch you are talking about here regarding the paragraph above?

johnwilander commented 2 years ago

Thanks, Antoine! Not sure what that means though. Do they still support sending attribution reports to multiple sources for one triggering event or not?

Attribution Reporting has a 1:1 relationship with sources and triggers (see |sourceToAttribute| at https://wicg.github.io/conversion-measurement-api/#triggering-attribution) using a priority mechanism and defaulting to last click.

My understanding is that the existing PCM design only stores the most recent unattributed source for a given (source site, destination site) e.g. last-click scoped within a single source site. The only difference being discussed here is having a same-site pixel/wildcard API do last click in a (destination site) scoping instead.

For reference Attribution Reporting uses (reporting origin, destination site) instead of (source site, destination site) for scoping.

As expressed on #31 (comment), there are some concerns around the same-site pixel/JS wildcard API with regards to Attribution Reporting. But I think aligning on whether many to one relationships are ever allowed makes sense.

Let's hash out an example so I get this right (ARA == Attribution Reporting API).

The Setup

The user clicks Ad1 on ClickSourceA which takes them to ClickDestination. With priority medium in the case of ARA.
The user clicks Ad2 on ClickSourceA which takes them to ClickDestination. With priority high in the case of ARA.
The user clicks Ad3 on ClickSourceB which takes them to ClickDestination. With priority low in the case of ARA.

In PCM

There are now two pending clicks – { Ad2, ClickSourceA, ClickDestination } and { Ad3, ClickSourceB, ClickDestination }.

A redirected tracking pixel to ClickSourceA under ClickDestination can schedule an attribution report for Ad2 and a redirected tracking pixel to ClickSourceB can schedule an attribution report for Ad3.

A future wildcard trigger would "spend" both pending clicks but could either schedule an attribution report just for Ad3 (last click) or for both. And an unlinkable token signed by ClickDestination could be included in either just the attribution report for Ad3 (last click) or for both.

In Attribution Reporting API

[Please fill in, @johnivdel. Thanks!]

johnivdel commented 2 years ago

In Attribution Reporting

I will assume that the reporting origin being used on each clickSource is the clickSource, so we get a clear comparison.

There are three pending clicks { Ad1, ClickSourceA, ClickDestination, med}, { Ad2, ClickSourceA, ClickDestination, high }, { Ad2, ClickSourceB, ClickDestination, low }

A redirected tracking pixel to ClickSourceA under ClickDestination can schedule an attribution report for Ad2 (the browser chooses high over med priority). (same as PCM)

A redirected tracking pixel to ClickSourceB can schedule an attribution report for Ad3. (same as PCM)

dialtone commented 2 years ago

Ultimately, I have concerns (and I know @benjaminsavage does as well) at anything that would codify last touch attribution into the standard. Ideally, we would enable sources and destinations to utilize the attribution model which best solves their measurement use case (within the privacy constraints of each API.)

@eriktaubeneck Could you clarify which notions of last touch you are talking about here regarding the paragraph above?

Without putting words in @eriktaubeneck mouth, but expressing my opinion here, which I think may be similar to his.

The problem is that last touch (either view or click) is a complicated model to make work primarily due to incentives and how multiple marketing channels interact. As an example @eriktaubeneck brought up search where often people might not click an ad, but just view an ad, are reminded of a purchase they wanted to make, go on google and search for the term and complete the purchase. Another similar case, although probably doesn't interfere here, is that the user does click an ad and lands on the site, then remembers they had received a discount in their email for that product, go back to the email and click it and complete the purchase. (this also happens the other way around). In these, and many other cases, the last touch just happened to be that but it wasn't the originating cause of the purchase.

More generally speaking a last touch attribution model, is one of the primary reasons why we find ourselves in such privacy/brand safety shenanigans as everyone wants to be the last touch, and will do whatever they can to be the last thing the user sees. As a last added consideration, last touch models will drive last touch optimization which is known to not be particularly incremental (if at all) for reasons that are very intuitive (e.g. showing ads while you are already in line for the purchase).

Incidentally this also greatly discounts the power of contextual and brand awareness campaigns which can only very rarely be last touch campaigns because it's unlikely you will go from not knowing the product to purchase just because you clicked an ad and landed on the site, only few products are this level of impulse buy. From this point of view effectively the talk about being less data intensive and more privacy focused on one side, really clashes with the choice of last touch attribution here from an incentive standpoint. And frankly here I'm talking about last touch but in reality the specs so far are really detailed for last click, last view has some points in the event-level attribution API but not in PCM.

michael-oneill commented 2 years ago

Would it help if ad viewed stats were also sent in reports? The stats would still be unlinkable, just show how many or how much the ad had been seen in last say 24 hours by an anonomous user. The click would still be needed for fraud detection, but maybe it could be done for a verifiable subset?