Open bmcase opened 1 year ago
I'd like to follow up on the discussion we had a couple weeks ago on this issue; we got good feedback that we'll need to map out Ad Networks in another level of detail with the sorts of queries that each party (e.g. SSPs, DSPs) would care to run. Several folks @alextcone, @AramZS, @tgreasby, @csharrison had some thoughts on how to do this and at least a couple were willing volunteers to help.
Can I suggest that we start async and that anyone willing to take a stab at writing out their understanding of what different parties would want can post to this issue? If we need to iterate with much feedback we can also get into a doc, but more likely we can put this on the next PATCG agenda for end of July to continue discussing.
A couple considerations towards the solutions I'd like to mention:
I think these are the main parties using the data today for measurement and optimization (loosely based on the lumascape). I am sure I missed some of the players that need the data as well so everyone should feel free to chime in.
Buy side: Advertisers Ad Agencies (advertiser likely use more than one) Ad Servers Measurement companies (e.g., MTA vendors) DSPs
Sell side: SSPs Publishers
I am less familiar with the sell side. What did I miss?
On the buy side there's also the agency (or independent) trading desk (and trader).
On the sell side there are publisher ad servers. That said, like SSPs, often they are not set up to optimize to trigger events coming from advertiser sites.
How do Privacy Budgets work in IPA?
with Ben Savage and Martin Thomson
Summary:
Private Scope in IPA
Privacy budgets are a new thing for the web. In the world of 3rd party cookies sites learn information with full confidence about specific individuals' activity on other sites. Applying differential privacy to what sites learn from IPA queries enables us to limit how much information a site can learn about any specific individual.
The goal of IPA is that each site has a budget per epoch on the amount of information that can be learned about people who interacted with the site during that epoch. Our best approximation of a “person” in the IPA system is the matchkey, so more specifically IPA proposes that each site has a budget per epoch on the amount of information that can be learned about a given matchkey’s interaction with the site during that epoch.
At a high level, IPA allows sites to request encrypted matchkeys for source or trigger events occurring on their site. Sites can then add attributes to these reports (e.g. values to trigger events and breakdown keys to source events) and share them with other sites, also called Report Collectors. At any time, a Report Collector can take a batch of source and trigger reports and submit a query to the MPC to get an attribution measurement on these events. With each query the Report Collector must specify how much of its per epoch budget it wants to spend on that particular query. The Report Collector also specifies for each query the per matchkey sensitivity cap to be enforced by the MPC. The cap and budget allocated to this query together determine the parameters of the noise that is applied to the outputs such that the information released about each matchkey is at most the query’s budget.
Queries for different types of Report Collectors
There are different types of Report Collectors who will need to submit IPA queries. See our What is a “Report Collector?” explainer for more details, but the main classes we are working to support are Self-Attributing Publishers, Self-Attributing Advertisers, Ad Networks, and MMPs.
Source fan-out queries for Self-Attributing Publishers
A Self-Attributing Publisher is a site that runs their own ads and collects source reports for them. They also collect trigger reports from Advertiser Websites/Apps. IPA enables these sites to submit source fan-out queries, which consist of source reports from only that source site along with trigger reports from any number of Advertisers sites.
The budget to be spent on a source fan-out query is deducted from the source site’s per epoch budgets but not from the budgets of the trigger sites. More specifically, if a source fan-out query has source reports from multiple epochs, each of those epoch’s budgets for the source site is reduced by the amount to be spent on that query.
It is the Helper Parties who run the MPC queries that are also responsible for enforcing the privacy budgets. They are responsible for checking several things about each submitted query. Recall that the encrypted matchkeys have authenticated associated data with them that contains the site that requested the encrypted matchkey, the epoch when it was requested, and whether it was requested for a source or trigger event.
For source fan-out queries, the Helper Parties check that
Trigger fan-out queries for Self-Attributing Advertisers
A Self-Attributing Advertiser is an advertiser site that is large enough to perform its own ad-measurement in-house. They collect source reports from the publishers they buy ads from. IPA enables these trigger sites to submit trigger fan-out queries, which consist of trigger reports from only that trigger site along with source reports from any number of publisher sites or ad networks.
The budget to be spent on a trigger fan-out query is deducted from the trigger site’s per epoch budgets but not from the budgets of the source sites. If a trigger fan-out query has trigger reports from multiple epochs, each of those epoch’s budgets for the trigger site is reduced by the amount to be spent on that query. In practice, trigger fan-out queries likely just include reports from the most recent one or two epochs; source fan-out queries might look back several epochs for longer attribution windows.
For trigger fan-out queries, the Helper Parties check that
Queries for MMPs
“Mobile Measurement Partners” or MMPs are another example of a current “Report Collector”. They help advertisers perform conversion attribution queries across multiple publishers / ad-networks, and have the ability to perform cross-publisher attribution (including multi-touch attribution). In IPA MMPs run trigger fan-out queries on behalf of Advertiser Apps / Websites. This is nearly identical to the case of self-attributing publishers, with the only difference being that the responsibility of running queries has been delegated to the MMP. The Advertiser Apps/Website enables the MMP to submit IPA queries on its behalf and spend its privacy budget.
One MMP who is a service provider for many Advertisers won’t be able to combine budgets from multiple advertisers. They will see and spend from the budgets of all their different trigger sites with separate trigger fan-out queries for each.
Queries for Ad Networks
Ad Networks show ads across a large number of publisher apps / websites on behalf of many Advertiser apps / websites. They will need to collect reports about source and trigger events in order to submit IPA queries.
We are still exploring what the best options are for supporting privacy budgets for Ad Networks. We are considering two design proposals right now but would be open to additional constructions that would give good privacy protections for end-users.
Design Proposal 1 (no custom support added for Ad Networks)
In this proposal Ad Networks have to work (for the most part) within the earlier constraints of running source and trigger fan-out queries on behalf of the websites they work with. However, in the previous settings of the Self-Attributing publisher, the helpers verified that all source events originated from the same source site. This is not possible for ad networks as impressions are shown across many sites. In order to support source fan-out queries involving source events shown across many source sites, we could imagine adding support for source queries across multiple sites - and simply deduct from the privacy budget of all included sites.
Since sites generally work with many Ad Networks, this would lead to sites needing to delegate partial amounts of their budget to the different Ad Networks they work with. How might sites delegate their budgets?
In summary, managing the partitioning of privacy budgets across multiple ad networks would be very complex to manage. Worst case, it could push the ecosystem towards consolidation.
Design Proposal 2 (separate budgets for Ad Networks)
We consider an additional way of supporting Ad Network budgets. Instead of fixing the budget for a site and letting that be delegated towards Ad Networks, we have considered the idea of allowing each source site to delegate to a limited number of ad networks who would each have a constant-sized, cross-web privacy budget.
In this design, the total privacy loss is proportional to the number of ad networks that the user is exposed to rather than the number of sites they visit. For a user that visits relatively few sites, this could be worse, but for users that visit a modest number of sites, the set of ad networks they are exposed to could be less than the number of sites. The privacy loss in that case would be reduced and might not increase further as the user visits more sites (assuming they have delegated to the same set of ad networks the user previously encountered).
Assumptions:
This proposal would essentially reduce the Ad Network case to the same situation as the Self-Attributing publisher case:
In order to implement this second design, we would need the browser to bind the source reports to the ad network that is displaying the ad on the publisher’s website, in addition to the publisher’s site. To do this we would need the following:
getencryptedmatchkey()
API will have an additional boolean parameter,delegated
, which iffalse
will tell the browser to bind this report to only the top-level domain. Iftrue
, the report will be bound to both the top-level domain as well as the current (frame) context (here we assume the ads being shown by Ad Networks are in iframes that correspond to a domain operated by that Ad Network).getencryptedmatchkey()
API in the iframe of the ad they will supply thedelegated
parameter astrue
and the browser will create the report and bind it to both the site and the Ad Network.false
, then they get back a report bound to the top-level site. Since this top-level site is one which has decided to delegate queries, any report bound only to this site will be rejected by the Helpers and never leak any information about the user.Comparison of Ad Network Designs
The following figure illustrates a comparison of privacy budgets between the two designs.
Here is a table that compares the main two designs considered so far.
Open Questions for discussion: