Feedback on Contribution bounding value, scope, and epsilon

alexmturner commented 1 year ago

Hi all,

We're seeking some feedback on the Private Aggregation API's contribution budget. We'd appreciate any thoughts on both the value of the numeric bound as well as the scope (currently per-origin per-day and separate for FLEDGE and Shared Storage).

In particular, one change we're considering is moving the scope from per-origin to per-site. This would mitigate abuse potential for cases of things like wildcard domains which are (arguably) easier to mint than domains to exceed privacy limits. (See more discussion here.)

Thanks!

[January 2024 edit:] Additionally we would like to open the scope of this issue to understand feedback on epsilon. The Aggregation Service currently supports a range up to 64. Note that the Aggregation Service adds noise to summary reports that is distributed according to the Laplace distribution with a mean of zero and a standard deviation

sqrt(2) * L1 / epsilon

where L1 is currently 2^16. We are interested in understanding the smallest value of epsilon required the support the minimum viable functionality of your system

alexmturner commented 1 year ago

In addition to the change from per-origin to per-site, we're considering changing the time component of the contribution bound. Specifically, we're considering moving the existing contribution bound (max value sum of 2^16) to applying over a 10 minute window instead of a daily window. We hope this will allow more flexibility and simplify budget management. As a backstop to prevent worst-case leakage, we're considering a new larger daily bound, e.g. 2^20. We'd appreciate any feedback on this proposal!

xottabut commented 1 year ago

Hi Alex (@alexmturner ), I have a few questions regarding budget in context of Private Aggregation API. I am reading this document to understand the budget for Private Aggregation API contribution budget

Is L1 budget per (1) one event (i.e. ad impression) or (2) all events that happens during the period X (last 24 hours)? The documentation says: "each user agent will limit the contribution that it could make to the output of a query. In the case of a histogram operation, the user agent could bound the L1 norm of the values, i.e. the sum of all the contributions across all buckets". It is also not completely clear, what is the query here.
The document says: "We initially plan to use an L1 bound of 2^16 = 65 536" and then later on the same page: "We plan to enforce a per-site budget that resets every 10 minutes; that is, we will bound the contributions that any site can make to a histogram over any 10 minute window. As a backstop to limit worst-case leakage, we plan a separate, looser per-site bound that resets daily, limiting the daily L1 norm to 2^20 = 1 048 576." so what are the final limitations? 2^16 for any 10 minutes window and 2^20 for last 24 hours (two different limits or only one of them)?
The "site" in "per-site" or "origin" in "per-origin" is it referring to the publisher site(page) or reporting origin?

Another two documents with information about the budget but in Attribution Reporting API are:

https://github.com/WICG/attribution-reporting-api/blob/main/AGGREGATE.md#contribution-bounding-and-budgeting
experiments with Attribution Reporting API, section about the budget: https://docs.google.com/document/d/1bU0a_njpDcRd9vDR0AJjwJjrf3Or8vAzyfuK8JZDEfo/edit#bookmark=id.ogwovwz4ufo7 (from here I understand that in attribution API it is about limit per source/trigger event, that's why I have a question about "per event in Private Aggregation API")

Thanks!

alexmturner commented 1 year ago

Hi! Sorry for the delay in responding.

The L1 budget for Private Aggregation is over a time period as there isn't a clear notion of event in this API. We limit the sum of contributions' values to 2^16 over a rolling 10 min window for any site. Additionally, we limit the sum of contributions' values to 2^20 over a rolling 24 hour window for any site.
The query in that first bullet point is referring to a query to the aggregation service.
The origin/site for budgeting is the reporting origin/site.

Hope this answers your questions, but let me know if anything is still unclear :)

alexmturner commented 1 year ago

Closing as this change has been made

xottabut commented 1 year ago

Thank you Alex for the response. Sorry, but I feel like I am missing something here about the "each user agent will limit the contribution that it could make to the output of a query."

If query refers to a query to the aggregation service or in other words one aggregation service job that takes one batch of the aggregatable reports does it mean that in the next case the user contribution will be at maximum 65 536? Case: User contributes 1 aggregation key key_1=65 536 at 00:00 then same user contributes key_1=65 536 (or even key_2=65 536) at 00:15 (which is allowed by user agent limit). But on the ad-tech side these two reports are collected to one batch and in total contribute 2 * 65536 which is over the mentioned limit the contribution will be either lost or cut down to 65536?

alexmturner commented 1 year ago

Ah yes this wording is a bit confusing; I'll follow up to improve it. The idea is that the user agent is limiting the contribution it can make to the output of a query -- but you're right that that limit isn't a single number, rather a 'rate' over time depending on when the reports were triggered.

chrisbmcandrew commented 10 months ago

We continue to be excited about the Aggregation Service API, and its ability to combine with Shared Storage to deliver advanced reach reporting as previously mentioned.

We believe that making adjustments to the contribution budget would ensure the functionality for Reach & Frequency measurement. Brand advertisers specifically rely on accurate Reach measurement to measure the performance of their campaigns across the web and without a reasonable contribution budget the accuracy and effectiveness of Reach measurement would be greatly impacted. The two settings are:

A per-site budget that resets every 10 minutes.

The reporting window causes limits to Reach reporting for campaigns that deliver multiple ad events within the 10 minutes, as additional events outside of the contribution budget would be dropped. Given how users browse the web a single ten minute window can have significant opportunities for ad delivery and a large subset of ad events would be lost. This loss results in wide standard deviation on per-campaign reach reporting, which would limit the creation and usefulness of reports generated from these campaigns. Reducing the overall backstop cap would still allow for a reasonable limitation while similarly ensuring measurement that aligns to how users experience ads across a single session and day.

A backstop per-site 24 hour bound limiting to X^x (currently limit is L1 norm to 2^20 = 1 048 576).

Based on how users browse the web, the combination of a rolling window and a daily cap creates additional loss. Again the result of this is a wider range of reported Reach values and an impact to the usefulness of the output. Reducing the overall backstop cap would still allow for a limitation while similarly ensuring measurement that aligns to how users experience ads across a single session and day .

In both cases reported numbers are in aggregate and use Virtual Persons methodology that maintains the overall privacy goals. We look forward to an update on these two settings to ensure Brand advertising is maintained while still providing a safe and private API.

menonasha commented 9 months ago

Appreciate the feedback - reopening this issue for discussion - we will come back with thoughts.

We wanted to clarify - is the feedback that the use case requires increased budgets for both the 10 minute cap and daily backstop? Wanted to ask since you mention reducing the overall backstop cap in both paragraphs

chrisbmcandrew commented 9 months ago

@alexmturner @menonasha Yes, the impact of both is that typical browsing behavior across 10 min windows and across a day has significant opportunities for campaigns to Reach users and a large subset of ad events would be lost. Loss of a large quantity of these events, either due to a 10 cap or 1 day cap, results in unmeasurable Reach and Frequency which is critical to brand advertisers.

menonasha commented 8 months ago

We do understand that the contribution budget window could cause events to be dropped if a user is served a significant number of ads during the window. Ad techs should consider optimizing for the contribution budget such as by accounting for different campaign sizes or limiting the number of reports per campaign per user in a ten minute window. We would be interested to understand from the ecosystem whether the contribution budget still causes inaccurate Reach measurements after implementing optimization tactics.

In terms of providing additional budget either at the ten minute window or at the daily window, this change would allow for larger potential information gain on a given user, and so this is not in our immediate roadmap.

We would be interested to hear additional ideas of improvements we could make to solve this challenge of losing ad events while maintaining privacy. We welcome additional feedback and public discussion on this subject as we work towards a solution over the long term that addresses these concerns.

alexmturner commented 8 months ago

We have added this context in the original post as well but we would like to open the scope of this issue to understand feedback on epsilon. The Aggregation Service currently supports a range up to 64. Note that the Aggregation Service adds noise to summary reports that is distributed according to the Laplace distribution with a mean of zero and a standard deviation

sqrt(2) * L1 / epsilon

where L1 is currently 2^16. We are interested in understanding the smallest value of epsilon required the support the minimum viable functionality of your system.

patcg-individual-drafts / private-aggregation-api

Feedback on Contribution bounding value, scope, and epsilon #23