privacycg / private-click-measurement

Private Click Measurement
https://privacycg.github.io/private-click-measurement/
200 stars 8 forks source link

Conversion Filters Proposal #36

Open benjaminsavage opened 4 years ago

benjaminsavage commented 4 years ago

Hi @johnwilander and @hober!

This is an idea that we brought up on the web-adv W3C call. @hober thanked us for explaining the use-case and suggested bringing it to an issue on this repo.

Conversion Filters Proposal

Many ads shown on publisher websites direct people to large e-Commerce websites that sell a wide variety of unrelated products. Think of Amazon, Walmart, Target, Wish, etc.

Both Webkit’s “Private Click Measurement” proposal, as well as Chrome’s “Conversion Measurement” proposal currently suggest that all conversion events on a given domain could match up with any ad click that directed a browser to that domain. This poses a significant problem for such commerce websites. It would actually be desirable to collect less information here.

Use-Case 1: Collaborative Ads

Many producers do not sell directly to consumers via their own website. Instead, they sell their goods in stores and through large e-Commerce websites. As an example, let’s imagine a producer ShaveCo that manufactures shaving supplies. They sell these shaving supplies on e-Commerce platform MegaStore’s domain megastore.com. If they want to run ads promoting their shaving supplies, the destination of the ads would be megastore.com.

In their current form, the private click measurement APIs would not tell ShaveCo how many people purchased shaving products after clicking on their ads, it would tell them how many people purchased anything at all on megastore.com after seeing their ads. Since people commonly buy a wide variety of products on these large e-Commerce websites, it’s very likely that these APIs will be counting totally unrelated transactions for things produced by other producers, in totally different categories!

This information would be interesting to MegaStore (the other participant in this collaborative ads campaign) but not useful to ShaveCo. ShaveCo is only interested in conversion events that they consider relevant to a subset of the products available on megastore.com - their shaving supplies.

Use-Case 2: Dynamic Product Ads

The large e-Commerce website also wants to run their own ads promoting their website. They might run ads promoting a particular set of products (e.g. School Supplies in a “Back to School” campaign). While it might be interesting to know how many people ever bought anything at all on their website after clicking on an ad, it is also valuable to ask “how many people bought school-supplies after clicking on the ad promoting school supplies?”. They may additionally wish to know “How often was the exact product advertised purchased as a result of the advertisement?”

Proposal

Both conversion measurement APIs propose the addition of new attributes to the tag representing the ad in order to invoke the API. We propose the addition of another optional attribute, which would specify some type of “filter” for the set of conversions this particular ad wants to count.

We are not suggesting any specific protocol for how these “filters” would be implemented, but here is a short list of common use-cases that will drive value:

  • “Only count conversions where the product_id is 12345”
  • “Do not count conversions where the product_id is 12345”
  • “Only count conversions where the product is one of [12345, 23456, 34567, …]”
  • “Only count conversions which occur on the sub-domain electronics.megastore.com”
  • “Only count conversions that occur within the directory megastore.com/store/electronics”
  • “Only count conversions where the product category is ‘school-supplies’”
  • “Only count count conversions where the producer is ‘ShaveCo’”

Both conversion measurement APIs propose the introduction of certain conversion metadata which would be associated with the conversion event. The Webkit proposal also suggests a “priority” attribute. We propose that advertisers can additionally specify various meta-data about the conversion event (e.g. Producer: “ShaveCo”, product_id: “12345”, category: “Shaving Supplies”).

When the browser looks into the storage of clicks requesting attribution, it could compare the metadata on the conversion with the filter specified on that ad click. These would either “match” or “not-match”. There are two possible approaches that come to mind for what to do in the event that they do-not-match.

Do not attribute this conversion event to this click. Continue checking the other clicks that requested attribution, and if none match, do not generate an anonymous conversion report at all. Attribute the conversion to the click, but set the first “conversion-metadata” bit to zero to indicate “did not match the specified filter”.

Privacy Considerations

This proposal does not rely on any change to the total entropy contained in anonymous conversion reports, or how the bits are distributed. Platform vendors have already taken stances about how many bits to allow for representing both campaign_id as well as conversion_metadata, and this proposal should not affect those choices. Since this does not propose to change the bit entropy in conversion reports, we do not believe this proposal meaningfully impacts user privacy.

What this does do is allow websites with complex advertising campaigns and large product catalogs to more effectively utilize those bits of entropy to support common use cases.

johnwilander commented 4 years ago

Hi Ben! Thanks for filing.

Hi @johnwilander and @hober!

This is an idea that we brought up on the web-adv W3C call. @hober thanked us for explaining the use-case and suggested bringing it to an issue on this repo.

Conversion Filters Proposal

Many ads shown on publisher websites direct people to large e-Commerce websites that sell a wide variety of unrelated products. Think of Amazon, Walmart, Target, Wish, etc.

Both Webkit’s “Private Click Measurement” proposal, as well as Chrome’s “Conversion Measurement” proposal currently suggest that all conversion events on a given domain could match up with any ad click that directed a browser to that domain. This poses a significant problem for such commerce websites. It would actually be desirable to collect less information here.

Use-Case 1: Collaborative Ads

Many producers do not sell directly to consumers via their own website. Instead, they sell their goods in stores and through large e-Commerce websites. As an example, let’s imagine a producer ShaveCo that manufactures shaving supplies. They sell these shaving supplies on e-Commerce platform MegaStore’s domain megastore.com. If they want to run ads promoting their shaving supplies, the destination of the ads would be megastore.com.

In their current form, the private click measurement APIs would not tell ShaveCo how many people purchased shaving products after clicking on their ads, it would tell them how many people purchased anything at all on megastore.com after seeing their ads.

This is not the case. For a conversion report to be scheduled and sent, these things need to line up (using your store examples):

If the above happens, a conversion report is scheduled and later sent with these four pieces of data:

Since people commonly buy a wide variety of products on these large e-Commerce websites, it’s very likely that these APIs will be counting totally unrelated transactions for things produced by other producers, in totally different categories!

This information would be interesting to MegaStore (the other participant in this collaborative ads campaign) but not useful to ShaveCo. ShaveCo is only interested in conversion events that they consider relevant to a subset of the products available on megastore.com - their shaving supplies.

Use-Case 2: Dynamic Product Ads

The large e-Commerce website also wants to run their own ads promoting their website. They might run ads promoting a particular set of products (e.g. School Supplies in a “Back to School” campaign). While it might be interesting to know how many people ever bought anything at all on their website after clicking on an ad, it is also valuable to ask “how many people bought school-supplies after clicking on the ad promoting school supplies?”. They may additionally wish to know “How often was the exact product advertised purchased as a result of the advertisement?”

Proposal

Both conversion measurement APIs propose the addition of new attributes to the tag representing the ad in order to invoke the API. We propose the addition of another optional attribute, which would specify some type of “filter” for the set of conversions this particular ad wants to count.

We are not suggesting any specific protocol for how these “filters” would be implemented, but here is a short list of common use-cases that will drive value:

  • “Only count conversions where the product_id is 12345”
  • “Do not count conversions where the product_id is 12345”
  • “Only count conversions where the product is one of [12345, 23456, 34567, …]”
  • “Only count conversions which occur on the sub-domain electronics.megastore.com”
  • “Only count conversions that occur within the directory megastore.com/store/electronics”
  • “Only count conversions where the product category is ‘school-supplies’”
  • “Only count count conversions where the producer is ‘ShaveCo’”

This sounds complex to me and introduces new classes of data. I believe what you're requesting could be achieved through just the campaign ID. Say the click destination (in your example shaveco.example) can express "Trigger a conversion report if this conversion matches a stored ad click for campaign IDs [a, b, c …]" or "Trigger a conversion report if this conversion does not match a stored ad click for campaign IDs [x, y, z …]" in the HTTP GET request. Would that solve the issue you describe?

Both conversion measurement APIs propose the introduction of certain conversion metadata which would be associated with the conversion event. The Webkit proposal also suggests a “priority” attribute. We propose that advertisers can additionally specify various meta-data about the conversion event (e.g. Producer: “ShaveCo”, product_id: “12345”, category: “Shaving Supplies”).

When the browser looks into the storage of clicks requesting attribution, it could compare the metadata on the conversion with the filter specified on that ad click. These would either “match” or “not-match”. There are two possible approaches that come to mind for what to do in the event that they do-not-match.

Do not attribute this conversion event to this click. Continue checking the other clicks that requested attribution, and if none match, do not generate an anonymous conversion report at all. Attribute the conversion to the click, but set the first “conversion-metadata” bit to zero to indicate “did not match the specified filter”.

Privacy Considerations

This proposal does not rely on any change to the total entropy contained in anonymous conversion reports, or how the bits are distributed. Platform vendors have already taken stances about how many bits to allow for representing both campaign_id as well as conversion_metadata, and this proposal should not affect those choices. Since this does not propose to change the bit entropy in conversion reports, we do not believe this proposal meaningfully impacts user privacy.

What this does do is allow websites with complex advertising campaigns and large product catalogs to more effectively utilize those bits of entropy to support common use cases.

We have to be careful here. Clever filter use could potentially achieve unique reports per user and thus reveal to the click destination site that a particular conversion was due to an ad click of a specific ad campaign on a specific click source site. That goes against the design goals of this proposal.

Limited complexity is our friend here. The more vectors and variations we allow, the higher the risk is for a combination of feature uses to open up for cross-site tracking of individual users.

benjaminsavage commented 4 years ago

Thanks for the response @johnwilander!

I think there is a mis-understanding here.

This is not the case. For a conversion report to be scheduled and sent, these things need to line up (using your store examples):

An ad click must have happened on megastore.com navigating the user to shaveco.example and pushing an ad campaign ID into the internal PCM database in the browser. An HTTP GET request to megastore.com must happen on shaveco.example and be directed according to the conversion triggering procedure.

This is not the use-case I am describing at all. Let me try again to explain what the model is:

This is the use-case we have today. We call companies like the "ShaveCo" in this example "Producers". They just make goods, they don't operate websites for people. They sell their good through "MegaStore". There are many, many such producers. They can still run ads on facebook.com, but the click destination is "megastore.com"

So for cases like this, if someone were to click on that ad shown on facebook.com, with a click destination of megastore.com, any subsequent purchase on megastore.com (possibly a sofa, possibly a frying-pan, and maybe something from ShaveCo) would all match up to that click on the ad on facebook.com.

I hope this helps clarify the use-case a bit better.

johnwilander commented 4 years ago

Thanks for the response @johnwilander!

I think there is a mis-understanding here.

This is not the case. For a conversion report to be scheduled and sent, these things need to line up (using your store examples): An ad click must have happened on megastore.com navigating the user to shaveco.example and pushing an ad campaign ID into the internal PCM database in the browser. An HTTP GET request to megastore.com must happen on shaveco.example and be directed according to the conversion triggering procedure.

This is not the use-case I am describing at all. Let me try again to explain what the model is:

  • Person is browsing facebook.com
  • They see an ad promoting shaving products
  • The click destination of the ad is megastore.com
  • The company that produces the shaving products does not sell direct to consumers through their own website. There is no shaveco.example

This is the use-case we have today. We call companies like the "ShaveCo" in this example "Producers". They just make goods, they don't operate websites for people. They sell their good through "MegaStore". There are many, many such producers. They can still run ads on facebook.com, but the click destination is "megastore.com"

So for cases like this, if someone were to click on that ad shown on facebook.com, with a click destination of megastore.com, any subsequent purchase on megastore.com (possibly a sofa, possibly a frying-pan, and maybe something from ShaveCo) would all match up to that click on the ad on facebook.com.

I hope this helps clarify the use-case a bit better.

Got it. Then I think filtering on campaign ID would solve both your use cases.

benjaminsavage commented 4 years ago

Got it. Then I think filtering on campaign ID would solve both your use cases.

I don't think that it would.

Imagine an ad that looks like this:

<a adCampaignID="54" adDestination="megastore.com/personal_care/shaveco" />

If I were to click this ad, under the current proposal, any purchase I make on "megastore.com" would match with this click, and would be reported back as a conversion for campaignID="54". This means that the number of conversions reported for this campaign would actual mean: "How many total conversions happened on "megastore.com" after this click. It would not enable shaveco to measure how many of their products were sold after a click on this ad.

johnwilander commented 4 years ago

Got it. Then I think filtering on campaign ID would solve both your use cases.

I don't think that it would.

Imagine an ad that looks like this:

<a adCampaignID="54" adDestination="megastore.com/personal_care/shaveco" />

adDestination can only be a registrable domain (eTLD+1).

If I were to click this ad, under the current proposal, any purchase I make on "megastore.com" would match with this click, and would be reported back as a conversion for campaignID="54". This means that the number of conversions reported for this campaign would actual mean: "How many total conversions happened on "megastore.com" after this click. It would not enable shaveco to measure how many of their products were sold after a click on this ad.

What I’m proposing we explore is that the conversion on megastore.com carries the ad campaign IDs it is valid for. In your case, if the conversion says valid for campaign ID 54, there would be a match and a report would be scheduled. Any other ID would not be matched and thus no report scheduled.

benjaminsavage commented 4 years ago

What I’m proposing we explore is that the conversion on megastore.com carries the ad campaign IDs it is valid for. In your case, if the conversion says valid for campaign ID 54, there would be a match and a report would be scheduled. Any other ID would not be matched and thus no report scheduled.

Great! I feel like we are making progress!

This approach would technically work as a mechanism for supporting the use-case I have described.

...but I have a few comments =).

First of all, the limit on just 64 campaign IDs is going to be a big problem for MegaStore. They likely have a LOT more than 64 producers who would each like to run at least one ad campaign. MegaStore themselves probably wants to run a few ad campaigns as well.

In #11 there is a discussion of re-allocating the 12 bits from a 6,6 split to potentially an 8,4 split. This would work much better for MegaStore. That way they could at least run 256 campaigns. Still incredibly restrictive, but a significant improvement. I think the loss in expressibility of conversion value is a price we are willing to pay here to increase the total number of campaigns we can report results for.

Assuming we go with an 8,4 split of the bits, the next problem is more of a developer ergonomic one.

Synchronizing these campaign IDs between Facebook and MegaStore is going to be a hassle. The campaignIDs are generated by Facebook, and probably change frequently. The approach you describe would require the developers at MegaStore who fire the conversion events to coordinate really closely with Facebook each and every time they start a new ad campaign. This would be a real pain.

It would be significantly easier for this collaboration if we could just use arbitrary key-value pairs to label the conversions, and arbitrary key-value pairs as filters on the tag.

This would have the same ultimate effect (in terms of the set of anonymous conversion reports received) as your proposal, but would require so much less coupling between the two companies. MegaStore would just label their conversions with whatever meta-data they wanted (arbitrary key-value pairs), and when they (or any producer to sells things on MegaStore) wanted to run an ad campaign, they would just tell Facebook which key-value pairs they wanted to use as filters for their ad campaign.

We have to be careful here. Clever filter use could potentially achieve unique reports per user and thus reveal to the click destination site that a particular conversion was due to an ad click of a specific ad campaign on a specific click source site. That goes against the design goals of this proposal.

I totally understand the concern. I understand and respect the Webkit anti-tracking policy and want to find a solution for this use-case that works within that tracking threat model.

A number of us at Facebook have thought through this proposal, and cannot see how it adds any additional tracking capabilities over the current proposal. If you can find a way, by all means share it with us - I'm happy to be proven wrong.

From a purely theoretical standpoint, the current proposal already enables tracking an individual user, you just have to sacrifice one of your campaignIDs to do so. (i.e. Ben Savage will be campaign ID 12... nobody else will ever get served an ad for campaign ID 12.). Alternatively you can sacrifice one of your conversion values to achieve this end as well (conversion metadata = 3 means this purchase was maade by Ben Savage). Clearly this doesn't scale, and there is no financial incentive for companies to do this - I am simply making a theoretical point. Filters seem to me like they do not make this worst case scenario any worse. You would still need to sacrifice either a campaign ID or a conversion metadata value in order to track a specific individual.

Limited complexity is our friend here. The more vectors and variations we allow, the higher the risk is for a combination of feature uses to open up for cross-site tracking of individual users.

I totally resonate with this. As a software engineer, I always try to go with the simplest solution and avoid adding more features than necessary. Let's keep brainstorming and see if we can find a simple, yet low-inter-company-coupling way of solving for this use-case that does not open up any new cross-site tracking vectors!