Open cjpatton opened 1 month ago
The use case might need fleshing out some, but this is just about allowing reports to be segregated for anti-replay and tracking purposes.
It comes most in handy when you think about making queries that are bound by a total privacy budget. If you have a query that are shaped in one way and that use x% of the privacy budget and another query that uses y% of the budget, giving each a different partition can mean that you can keep both the anti-replay and privacy budget tracking separate for reports in each. Having that enforced structurally avoids dipping into shared state if you run those queries concurrently, but it also makes the privacy budget tracking easier.
That sort of thing is generally the domain of a task when you know about tasks, but that requires a bunch of prior knowledge about what you might ask. This would be more flexible.
General question about the draft: when any of the extensions are in use, do you change how DAP does replay protection? This seems to be suggested in the intro, but I'm not clear on what the changes are. (FWIW I don't object to a different or more relaxed replay mechanism if DP closes the gap.)
These generally would allow for tighter constraints on scope for replay protection, with the exception of the "no task_id" one, which expands the scope.
(Of course, you can always track replays across a broader scope, the odds of rejecting valid measurements should be negligible. This is about making it possible to narrow the scope.)
Is the intent of "Report Partition" to ensure that reports are only aggregated together if they share the same label? If so, you may want to define a batch mode for DAP. The idea of a batch mode is that it's supposed to dictate how reports can be partitioned into batches. For instance, you may want that the collector specifies the label or labels it wants to aggregate on.
FWIW, Mastic may provide the functionality you want. It has the added value of not revealing to the aggregators which reports have which labels. However, the communication cost might be prohibitively high, depending on the metrics you want to aggregate.