patcg-individual-drafts / private-aggregation-api

Explainer for proposed web platform API
https://patcg-individual-drafts.github.io/private-aggregation-api/
43 stars 19 forks source link

Feedback on Contribution bounding value, scope, and epsilon #23

Open alexmturner opened 1 year ago

alexmturner commented 1 year ago

Hi all,

We're seeking some feedback on the Private Aggregation API's contribution budget. We'd appreciate any thoughts on both the value of the numeric bound as well as the scope (currently per-origin per-day and separate for FLEDGE and Shared Storage).

In particular, one change we're considering is moving the scope from per-origin to per-site. This would mitigate abuse potential for cases of things like wildcard domains which are (arguably) easier to mint than domains to exceed privacy limits. (See more discussion here.)

Thanks!

[January 2024 edit:] Additionally we would like to open the scope of this issue to understand feedback on epsilon. The Aggregation Service currently supports a range up to 64. Note that the Aggregation Service adds noise to summary reports that is distributed according to the Laplace distribution with a mean of zero and a standard deviation

sqrt(2) * L1 / epsilon

where L1 is currently 2^16. We are interested in understanding the smallest value of epsilon required the support the minimum viable functionality of your system

alexmturner commented 1 year ago

In addition to the change from per-origin to per-site, we're considering changing the time component of the contribution bound. Specifically, we're considering moving the existing contribution bound (max value sum of 2^16) to applying over a 10 minute window instead of a daily window. We hope this will allow more flexibility and simplify budget management. As a backstop to prevent worst-case leakage, we're considering a new larger daily bound, e.g. 2^20. We'd appreciate any feedback on this proposal!

xottabut commented 1 year ago

Hi Alex (@alexmturner ), I have a few questions regarding budget in context of Private Aggregation API. I am reading this document to understand the budget for Private Aggregation API contribution budget

Another two documents with information about the budget but in Attribution Reporting API are:

Thanks!

alexmturner commented 1 year ago

Hi! Sorry for the delay in responding.

Hope this answers your questions, but let me know if anything is still unclear :)

alexmturner commented 1 year ago

Closing as this change has been made

xottabut commented 1 year ago

Thank you Alex for the response. Sorry, but I feel like I am missing something here about the "each user agent will limit the contribution that it could make to the output of a query."

If query refers to a query to the aggregation service or in other words one aggregation service job that takes one batch of the aggregatable reports does it mean that in the next case the user contribution will be at maximum 65 536? Case: User contributes 1 aggregation key key_1=65 536 at 00:00 then same user contributes key_1=65 536 (or even key_2=65 536) at 00:15 (which is allowed by user agent limit). But on the ad-tech side these two reports are collected to one batch and in total contribute 2 * 65536 which is over the mentioned limit the contribution will be either lost or cut down to 65536?

alexmturner commented 1 year ago

Ah yes this wording is a bit confusing; I'll follow up to improve it. The idea is that the user agent is limiting the contribution it can make to the output of a query -- but you're right that that limit isn't a single number, rather a 'rate' over time depending on when the reports were triggered.

chrisbmcandrew commented 10 months ago

We continue to be excited about the Aggregation Service API, and its ability to combine with Shared Storage to deliver advanced reach reporting as previously mentioned.

We believe that making adjustments to the contribution budget would ensure the functionality for Reach & Frequency measurement. Brand advertisers specifically rely on accurate Reach measurement to measure the performance of their campaigns across the web and without a reasonable contribution budget the accuracy and effectiveness of Reach measurement would be greatly impacted. The two settings are:

A per-site budget that resets every 10 minutes.

A backstop per-site 24 hour bound limiting to X^x (currently limit is L1 norm to 2^20 = 1 048 576).

In both cases reported numbers are in aggregate and use Virtual Persons methodology that maintains the overall privacy goals. We look forward to an update on these two settings to ensure Brand advertising is maintained while still providing a safe and private API.

menonasha commented 9 months ago

Appreciate the feedback - reopening this issue for discussion - we will come back with thoughts.

We wanted to clarify - is the feedback that the use case requires increased budgets for both the 10 minute cap and daily backstop? Wanted to ask since you mention reducing the overall backstop cap in both paragraphs

chrisbmcandrew commented 9 months ago

@alexmturner @menonasha Yes, the impact of both is that typical browsing behavior across 10 min windows and across a day has significant opportunities for campaigns to Reach users and a large subset of ad events would be lost. Loss of a large quantity of these events, either due to a 10 cap or 1 day cap, results in unmeasurable Reach and Frequency which is critical to brand advertisers.

menonasha commented 8 months ago

We do understand that the contribution budget window could cause events to be dropped if a user is served a significant number of ads during the window. Ad techs should consider optimizing for the contribution budget such as by accounting for different campaign sizes or limiting the number of reports per campaign per user in a ten minute window. We would be interested to understand from the ecosystem whether the contribution budget still causes inaccurate Reach measurements after implementing optimization tactics.

In terms of providing additional budget either at the ten minute window or at the daily window, this change would allow for larger potential information gain on a given user, and so this is not in our immediate roadmap.

We would be interested to hear additional ideas of improvements we could make to solve this challenge of losing ad events while maintaining privacy. We welcome additional feedback and public discussion on this subject as we work towards a solution over the long term that addresses these concerns.

alexmturner commented 8 months ago

We have added this context in the original post as well but we would like to open the scope of this issue to understand feedback on epsilon. The Aggregation Service currently supports a range up to 64. Note that the Aggregation Service adds noise to summary reports that is distributed according to the Laplace distribution with a mean of zero and a standard deviation

sqrt(2) * L1 / epsilon

where L1 is currently 2^16. We are interested in understanding the smallest value of epsilon required the support the minimum viable functionality of your system.