Closed alexmturner closed 5 months ago
@linnan-github, could you PTAL? Thanks!
Thanks for the review, Nan!
@arichiv, could you PTAL as well? Thanks!
Two additional things I'm seeking feedback on:
filteringIdByteSize
. For context, this allows for using a custom size for the filtering ID space, e.g. you could specify 4 to allow for a 32-bit ID. (We only allow integer bytes which is why we went with ByteSize not BitSize.) Other options we were thinking of is filteringIdByteLength
or filteringIdMaxBytes
(maybe leaning towards this).filteringIdByteSize
is 1. When a non-default value is used, a report is sent even if no contributions were sent (a "null report"). This is done to avoid a privacy concern from the different payload length. However, it might be easier to describe the behavior if we activate this null reporting logic whenever a filteringIdByteSize
is explicitly specified, even if it's for the default value. Personally, I'm probably leaning against this as it's not needed for privacy.@linnan-github @arichiv, I've updated this PR to no longer have an explicit opt-in or require debug mode. This should simplify the change substantially, albeit potentially delay the launch slightly (as we will now want all current aggregation service releases to support the new format before launching to Chrome Stable).
It might be worth splitting this change into two PRs:
context id
with a pre-specified report parameters
struct that contains only a context id
filtering ID byte size
to the pre-specified report parameters
structDoing both at once when so many spots are touched by 1 but only a few feel the impact of 2 is risky from a comprehensive review perspective.
Thanks for the review, @arichiv! I was also wondering if you had any feedback on the questions in https://github.com/patcg-individual-drafts/private-aggregation-api/pull/123#issuecomment-2070958555.
Two additional things I'm seeking feedback on:
- I'm a little unsure what to name
filteringIdByteSize
. For context, this allows for using a custom size for the filtering ID space, e.g. you could specify 4 to allow for a 32-bit ID. (We only allow integer bytes which is why we went with ByteSize not BitSize.) Other options we were thinking of isfilteringIdByteLength
orfilteringIdMaxBytes
(maybe leaning towards this).
I like filteringIdMaxBytes given the usage I see.
- The default
filteringIdByteSize
is 1. When a non-default value is used, a report is sent even if no contributions were sent (a "null report"). This is done to avoid a privacy concern from the different payload length. However, it might be easier to describe the behavior if we activate this null reporting logic whenever afilteringIdByteSize
is explicitly specified, even if it's for the default value. Personally, I'm probably leaning against this as it's not needed for privacy.
Is there a design doc I could review? I'm not sure I understand the implications from what I've reviewed alone
The explainer here might help, but it is a bit terse: https://github.com/patcg-individual-drafts/private-aggregation-api/blob/main/flexible_filtering.md#small-id-space-by-default-but-configurable.
To elaborate on the privacy concern, when reports are not deterministically sent, the count of reports is potentially sensitive, i.e. based on cross-site data. For example, a script could choose whether or not to make contributions in a shared storage worklet (and a report would only be sent if there are contributions). The filtering ID feature allows scripts to also affect payload size (although only based on first-party data), which could allow an adversary to "partition" its counts -- i.e. it could separately count the number of reports with each different payload size to gain more information. Thus, this additional metadata would amplify the existing counting attack.
We prevent this by ensuring that, whenever a non-default byte size is chosen (and so a non-default payload size will occur) reports are sent deterministically, i.e. even if the script makes no contributions. This means that the count of reports with that payload size can no longer be affected by cross-site data.
The question here is whether we should also extend this protection to cases where a script has explicitly chosen the default byte size. There is no privacy reason to do this -- these reports are indistinguishable from reports that got the default due to not specifying anything. However, it arguably might be easier for developers to grok the "use the param => deterministic report" logic over "specify a non-default value => deterministic report" logic. There is also a small performance downside to this -- a report will be needlessly sent if no contributions are made in this case. I'm leaning against this due to this small performance impact and because it feels more natural to align the logic with the privacy requirement; it also means we can define the WebIDL in the form type filteringIdMaxBytes = 1
and not have special "not present" logic.
(And no worries if you don't really have opinions here -- happy to resolve internally if so)
I get it now, thanks for the elaboration. I would suggest something slightly more elaborate: modifying SharedStoragePrivateAggregationConfig
to take both a filteringIdMaxBytes
and a new enum called something like reportingStrategy
which is a string that can be wait-for-contributions
or always-send
(open to other naming).
filteringIdMaxBytes
has the same current properties (defaults to 1).
reportingStrategy
defaults to wait-for-contributions
when filteringIdMaxBytes
is 1 and always-send
otherwise (doesn't distinguish between filteringIdMaxBytes
being 1 explicitly or by default). If reportingStrategy
is set to wait-for-contributions
when filteringIdMaxBytes
isn't 1 an error is thrown. If reportingStrategy
is set to always-send
when filteringIdMaxBytes
is 1 that's fine.
Just switched the spec over to using "max bytes".
Oh interesting! I do see the appeal of making this property a bit more explicit, but I don't think any developer will want to opt-in to additional null reports if they can avoid it. Chatted briefly internally and, given that, we were thinking that leaving the logic as is should be simple enough for developers (and aligns with this idea that they want to minimize null reports when possible).
That being said, we could consider adopting randomized null reporting for the non-deterministic case in the future, similar to ARA's proposal. If we do end up pursuing that, we might want to return to this idea as there would be more of a meaningful tradeoff.
Thanks again both of you for all the reviews!
Yep, that's right!
Specs the ability to set a filtering ID (and modify the default ID space). See https://github.com/patcg-individual-drafts/private-aggregation-api/blob/main/flexible_filtering.md#proposal-filtering-id-in-the-encrypted-payload and issue #92.
To support this new functionality, we increase the report version. Note that this also requires aggregation service versions to support the new version.
Preview | Diff