Open alexmturner opened 1 year ago
Thanks Alex, I want to note that the context ID / deterministic reports approach is compatible with this related proposal https://github.com/WICG/attribution-reporting-api/issues/974, although it isn't clear all deployments could use that option.
Thank you for proposing this solution. It seems to be very interesting.
I'm wondering how exactly assigning a label to PAA data would look like. Would it be possible to assign a label for each key, value pair separately, or only once per entire auction?
We have several use cases in which we would like to use PAA: machine learning, monitoring, and reporting. For example, we would like to report:
privateAggregation.contributeToHistogram({bucket: key1, value: val1, label: "ml"})
privateAggregation.contributeToHistogram({bucket: key2, value: val2, label: "ml"})
privateAggregation.contributeToHistogram({bucket: key3, value: val3, label: "monitoring"})
privateAggregation.contributeToHistogram({bucket: key4, value: val4, label: "monitoring"})
privateAggregation.contributeToHistogram({bucket: key5, value: val5, label: "reporting"})
This is related to the fact that each of these cases has different requirements:
It seems that this can also be achieved using proposal 3 - "bucket range filtering". However, if a label can be attached per individual histogram, this solution seems more convenient.
This is a very interesting proposal, thank you!
The support that will be most useful to us are very similar to what @michal-kalisz described above, but applies to ARA summary reporting rather than PAA. There are several use cases that we have which have different latency requirements and operate on data aggregates that have very different cardinality for the different aggregation keys. For example, a reporting use case has many different breakdowns and can wait longer, while a real time monitoring use case might have much fewer breakdowns but require data to be batched up with minimal latency.
Considering that these different use cases will have their values set under different aggregation keys ("reporting", "monitoring") and they will collectively share the same total L1 budget for the report, it will be great if we can have the "label" attached to each of the aggregation keys (i.e. option 2 + per key label), and have the ability to include the same aggregatable report in multiple summary reports, as long as each query uses a disjoint set of labels.
A secondary optimization (can be built on top) is to go with option 1 and store the set of labels in the shared_info to allow for more efficient batching of reports, but this is more of a nice to have.
Thanks for all the feedback! We've put up a proposal that we hope satisfies your use cases: https://github.com/patcg-individual-drafts/private-aggregation-api/blob/main/flexible_filtering.md.
Note that we've used different terminology to this issue but the proposal aligns with Option 2 (with a possible extension of adding Option 1 later). This proposal allows a separate label for each contribution within a report. And, while the proposal focuses on Private Aggregation, we plan to explore extending it to Attribution Reporting in a separate GitHub issue.
Currently, the aggregation service only allows each 'shared ID' to be present in one query. A set of reports with the same shared ID cannot be split for separate queries, even if the resulting batches are disjoint.
One option to add more flexibility is to support an optional, custom field (a ‘label’) that is factored into the shared ID generation. We could consider a few different options:
For all of the above approaches, we’ll also need a mechanism to limit the scale impact on the Privacy Budget Service. For example, we want to prevent developers from specifying a unique ‘label’ per report. There are a few options we could consider, including:
This functionality would also be useful for the Attribution Reporting API, so we may want to align on an approach. (For example, bucket range filtering has been proposed earlier.) Note that Attribution Reporting does not currently support making deterministic reports.