patcg-individual-drafts / private-aggregation-api

Explainer for proposed web platform API
https://patcg-individual-drafts.github.io/private-aggregation-api/
44 stars 20 forks source link

Spec: filtering IDs #123

Closed alexmturner closed 5 months ago

alexmturner commented 6 months ago

Specs the ability to set a filtering ID (and modify the default ID space). See https://github.com/patcg-individual-drafts/private-aggregation-api/blob/main/flexible_filtering.md#proposal-filtering-id-in-the-encrypted-payload and issue #92.

To support this new functionality, we increase the report version. Note that this also requires aggregation service versions to support the new version.


Preview | Diff

alexmturner commented 5 months ago

@linnan-github, could you PTAL? Thanks!

alexmturner commented 5 months ago

Thanks for the review, Nan!

alexmturner commented 5 months ago

@arichiv, could you PTAL as well? Thanks!

alexmturner commented 5 months ago

Two additional things I'm seeking feedback on:

alexmturner commented 5 months ago

@linnan-github @arichiv, I've updated this PR to no longer have an explicit opt-in or require debug mode. This should simplify the change substantially, albeit potentially delay the launch slightly (as we will now want all current aggregation service releases to support the new format before launching to Chrome Stable).

arichiv commented 5 months ago

It might be worth splitting this change into two PRs:

  1. replacing context id with a pre-specified report parameters struct that contains only a context id
  2. adding a filtering ID byte size to the pre-specified report parameters struct

Doing both at once when so many spots are touched by 1 but only a few feel the impact of 2 is risky from a comprehensive review perspective.

alexmturner commented 5 months ago

Thanks for the review, @arichiv! I was also wondering if you had any feedback on the questions in https://github.com/patcg-individual-drafts/private-aggregation-api/pull/123#issuecomment-2070958555.

arichiv commented 5 months ago

Two additional things I'm seeking feedback on:

  • I'm a little unsure what to name filteringIdByteSize. For context, this allows for using a custom size for the filtering ID space, e.g. you could specify 4 to allow for a 32-bit ID. (We only allow integer bytes which is why we went with ByteSize not BitSize.) Other options we were thinking of is filteringIdByteLength or filteringIdMaxBytes (maybe leaning towards this).

I like filteringIdMaxBytes given the usage I see.

  • The default filteringIdByteSize is 1. When a non-default value is used, a report is sent even if no contributions were sent (a "null report"). This is done to avoid a privacy concern from the different payload length. However, it might be easier to describe the behavior if we activate this null reporting logic whenever a filteringIdByteSize is explicitly specified, even if it's for the default value. Personally, I'm probably leaning against this as it's not needed for privacy.

Is there a design doc I could review? I'm not sure I understand the implications from what I've reviewed alone

alexmturner commented 5 months ago

The explainer here might help, but it is a bit terse: https://github.com/patcg-individual-drafts/private-aggregation-api/blob/main/flexible_filtering.md#small-id-space-by-default-but-configurable.

To elaborate on the privacy concern, when reports are not deterministically sent, the count of reports is potentially sensitive, i.e. based on cross-site data. For example, a script could choose whether or not to make contributions in a shared storage worklet (and a report would only be sent if there are contributions). The filtering ID feature allows scripts to also affect payload size (although only based on first-party data), which could allow an adversary to "partition" its counts -- i.e. it could separately count the number of reports with each different payload size to gain more information. Thus, this additional metadata would amplify the existing counting attack.

We prevent this by ensuring that, whenever a non-default byte size is chosen (and so a non-default payload size will occur) reports are sent deterministically, i.e. even if the script makes no contributions. This means that the count of reports with that payload size can no longer be affected by cross-site data.

The question here is whether we should also extend this protection to cases where a script has explicitly chosen the default byte size. There is no privacy reason to do this -- these reports are indistinguishable from reports that got the default due to not specifying anything. However, it arguably might be easier for developers to grok the "use the param => deterministic report" logic over "specify a non-default value => deterministic report" logic. There is also a small performance downside to this -- a report will be needlessly sent if no contributions are made in this case. I'm leaning against this due to this small performance impact and because it feels more natural to align the logic with the privacy requirement; it also means we can define the WebIDL in the form type filteringIdMaxBytes = 1 and not have special "not present" logic.

(And no worries if you don't really have opinions here -- happy to resolve internally if so)

arichiv commented 5 months ago

I get it now, thanks for the elaboration. I would suggest something slightly more elaborate: modifying SharedStoragePrivateAggregationConfig to take both a filteringIdMaxBytes and a new enum called something like reportingStrategy which is a string that can be wait-for-contributions or always-send (open to other naming).

filteringIdMaxBytes has the same current properties (defaults to 1).

reportingStrategy defaults to wait-for-contributions when filteringIdMaxBytes is 1 and always-send otherwise (doesn't distinguish between filteringIdMaxBytes being 1 explicitly or by default). If reportingStrategy is set to wait-for-contributions when filteringIdMaxBytes isn't 1 an error is thrown. If reportingStrategy is set to always-send when filteringIdMaxBytes is 1 that's fine.

alexmturner commented 5 months ago

Just switched the spec over to using "max bytes".

Oh interesting! I do see the appeal of making this property a bit more explicit, but I don't think any developer will want to opt-in to additional null reports if they can avoid it. Chatted briefly internally and, given that, we were thinking that leaving the logic as is should be simple enough for developers (and aligns with this idea that they want to minimize null reports when possible).

That being said, we could consider adopting randomized null reporting for the non-deterministic case in the future, similar to ARA's proposal. If we do end up pursuing that, we might want to return to this idea as there would be more of a meaningful tradeoff.

Thanks again both of you for all the reviews!

alexmturner commented 5 months ago

Yep, that's right!