Closed nurien2 closed 4 months ago
Hello @nurien2 and thanks for the feedback!
Aggregation Service itself does not put an upper limit on the number of keys or reports in a batch but a scale of 10^14 reports and 10^12 keys is currently unsupported due to the memory that would be required. Our sizing guidance indicates the ranges we have tested and recommend for optimal performance given expected load and the supported cloud vm instance types.
We are working on phase 1 of the key discovery proposal which will allow adtechs to query the aggregation service without pre-declaring keys. For phase 1, Adtechs will be able to optionally specify keys to guarantee that they are included in the output. Any keys not pre-declared will be thresholded before being included in the output. Note that while this solution helps to mitigate the challenge of pre-declaring a large number of keys, it doesn’t fully address the challenge of supporting a scale of up to 10^12 keys.
We would like to understand the use case a bit more to explore options to address this (e.g. considering batching strategies, flexible filtering and understanding why sampling isn’t an option). We’re happy to discuss this topic in detail on a WICG call or on this thread. Please let us know how you would prefer to proceed.
Thank you.
Closing this for now, but please feel free to re-open if you have more feedback.
Request: We would like to have more information about the scaling capabilities of the Aggregated Service.
Background: The use case that we have in mind is the one from Request for event-level ReportLoss API · Issue #930 · WICG/turtledove, where we would use the Private Aggregation API with the potential trigger described in Request for event-level ReportLoss API · Issue #930 · WICG/turtledove:
Such a trigger would produce an aggregatable report for every component auction a buyer would take part in. This would represent billions of aggregatable reports per hour, which is at least one order of magnitude above what is defined in https://github.com/privacysandbox/aggregation-service/blob/main/docs/sizing-guidance.md. For the use case described in this issue, we’re investigating performing the aggregation daily, so up to 10^14 of reports to be processed, with a set of size up to 10^12 pre-declared bucket keys. To have a sufficiently wide representation of our feature space (hence the high number of pre-declared keys above) and reach an acceptable level of noise for most of buckets, we need to gather a lot of contributions, and ideally avoid applying any sampling strategy.
On a side note, we have several usages in mind regarding the private aggregation API (see Add new reporting signal script-errors · Issue #494 · WICG/turtledove) targeting different aggregation frequency (hourly, daily, …). To be able to properly lever the aggregation service (and satisfy the underlying rules described in https://github.com/WICG/attribution-reporting-api/blob/main/AGGREGATION_SERVICE_TEE.md#privacy-considerations), we would require that a solution such as the one described in https://github.com/patcg-individual-drafts/private-aggregation-api/blob/main/flexible_filtering.md is implemented.
Questions: