ietf-wg-ppm / draft-ietf-ppm-dap

This document describes the Distributed Aggregation Protocol (DAP) being developed by the PPM working group at IETF.
Other
41 stars 20 forks source link

Thought experiment: no more report ID #558

Closed tgeoghegan closed 1 week ago

tgeoghegan commented 2 weeks ago

556 enumerates requirements in DAP that can be challenging to meet depending on the architecture of an implementation. Among them is the requirement that aggregators guard against duplicate report IDs. The Leader does this during upload handling and the Helper does it during aggregation job handling. In both cases, the requirement can be difficult to meet because it introduces a global lookup in an operation that would otherwise be embarrassingly parallel, in the sense that it has no dependencies on other data.

Ideally, the Leader's upload handler would involve just examining the incoming message and then inserting a single row into a table (or a single record into a k-v store, or...), but now it has to check against many previously seen report IDs to check for a collision. Similarly, the preparation of an input share into an output share doesn't require synchronization with any other report's preparation, but the Helper must do a potentially expensive check for the report ID's uniqueness. In particular, this means that aggregators must have a consistent view of storage shared with all other replicas of themselves (or at least a shard of storage).

So why actually do we have this requirement of no repeated report IDs? Recalling that the DAP report ID is used as the VDAF nonce, draft-irtf-cfrg-vdaf's {{nonce-requirements}} says:

The sharding and preparation steps of VDAF execution depend on a nonce
associated with the Client's report. To ensure privacy of the underlying
measurement, the Client MUST generate this nonce using a CSPRNG. This is
required in order to leverage security analysis for the privacy definition of
{{DPRS23}}, which assumes the nonce is chosen at random prior to generating the
report.

Nonce collisions are definitely a problem, but ours are 16 bytes and so generating them with a CSPRNG gives us a reasonable guarantee of collision avoidance.

This section continues:

Other security considerations may require the nonce to be non-repeating. For
example, to achieve differential privacy it is necessary to avoid "over
exposing" a measurement by including it too many times in a single batch or
across multiple batches. It is RECOMMENDED that the nonce generated by the
Client be used by the Aggregators for replay protection.

If the attack is a malicious client including a single measurement in a batch "too many times", then I don't think report ID/nonce uniqueness helps: the malicious client could just send the desired measurement multiple times under distinct report IDs and have them incorporated into the batch.

Further, the report ID is never used in DAP to uniquely identify a report, except to enable anti-replay checks. It's not possible to GET a report from /tasks/{task-id}/reports/{report-id}. Aggregators never exchange a list of report IDs, either: instead, aggregation jobs contain whole report shares inline, and both collection job and aggregate share requests contain selectors that inform how aggregators will construct batches. In the time interval case, the report's timestamp is used, and for fixed size queries, the Leader arbitrarily assigns reports to batches. An aggregator implementation will certainly need some unique report identifier to keep track of which batch reports belong to, but that can be an implementation detail, generated inside either aggregator, and thus wouldn't require anything in the protocol to rule out duplicates. Crucially, we can trust an aggregator to correctly generate its own primary keys.

There are a handful of protocol messages where report ID appears, but really what's being transmitted there is the VDAF nonce, so we could just rename those fields.

I'm coming around to the view that we could remove the notion of unique report IDs from DAP, and thus delete the attendant uniqueness requirement. Unless, that is, there's some other reason that aggregators must guarantee that there are no nonce collisions in a task?

tgeoghegan commented 2 weeks ago

We're not concerned with a malicious client violating its own privacy, so the question is whether a malicious client choosing its nonce allows attacks on anyone else's privacy. The threat model assumes that an attacker can control both an aggregator and a coalition of clients. So the malicious aggregator could observe the nonce of an honest client's report, and then have a malicious client use that nonce in a report. Does this violate the privacy of the honest client?

cjpatton commented 2 weeks ago

If the attack is a malicious client including a single measurement in a batch "too many times", then I don't think report ID/nonce uniqueness helps: the malicious client could just send the desired measurement multiple times under distinct report IDs and have them incorporated into the batch.

True, but to be clear: we are concerned about a malicious Aggregator replaying an honest Client's report too many times.

We're not concerned with a malicious client violating its own privacy, so the question is whether a malicious client choosing its nonce allows attacks on anyone else's privacy. The threat model assumes that an attacker can control both an aggregator and a coalition of clients. So the malicious aggregator could observe the nonce of an honest client's report, and then have a malicious client use that nonce in a report. Does this violate the privacy of the honest client?

No, as long as the honest Aggregator only processes a report with that nonce once.

divergentdave commented 2 weeks ago

Enforcing report ID uniqueness is useful because it is cheaper than enforcing uniqueness of the rest of the contents of a report. We extend the protection against duplicate report IDs to protection against replayed report contents by binding the decryption of input shares to the report ID. One replay attack we need to defend against is when the adversary controls the leader, and it sends report shares from an honest client's report multiple times, either in one batch or across multiple batches. This is what the "over exposing" paragraph is referring to, rather than the attacker freshly generating multiple reports with the same measurement.

branlwyd commented 2 weeks ago

Removing the report ID does not remove the requirement on the report's uniqueness--e.g. if a malicious Leader could repeatedly include the same report in multiple batches, it would be able to violate the privacy of that report, whether the report has an attached ID or not.

I think removing the ID would make it more complicated for an aggregator to determine a report's identity. For example, they might derive an identity based on the hash of appropriate parts of the report's content, or even the entire content of the report. I think we'd probably need to spell out how a report's identity is determined in the spec.

Depending on how the AADs change & how report identity is determined, removing the report ID might also allow a malicious aggregator to forge reports using a "real" report's measurement, effectively allowing the privacy of that report to be violated.

cjpatton commented 2 weeks ago

One bit of related trivia, inspired by this conversation: checking for replay across batches is harder for fixed-size tasks than for time-interval tasks. In particular, this implementation detail is expensive for Daphne:

An aggregator implementation will certainly need some unique report identifier to keep track of which batch reports belong to, but that can be an implementation detail, ...

tgeoghegan commented 2 weeks ago

I think I am persuaded by the argument that we have to prevent the Leader from replaying an honest Client's report: this is not equivalent to doing a Sybil attack where the same measurement is repeatedly uploaded, because the malicious Leader can do this without knowing the honest Client's measurement, and could learn what that measurement is by constructing a batch containing exclusively the honest Client report. So on that basis alone, I'm convinced we have to keep report IDs and the anti-replay check (though perhaps only in the Helper...)

But there's one more thing I want to make sure I understand, and would like to have on the record:

@cjpatton wrote:

No, as long as the honest Aggregator only processes a report with that nonce once.

Forgetting over exposure for a second, let's say two different reports containing two different measurements both use the same nonce. Is that a problem? Why?

cjpatton commented 2 weeks ago

It's definitely problematic from the standpoint of the proven security of Prio3,: ia.cr/2023/130, Theorem 2 says that privacy only holds up to nonce collisions. So if two clients, honest or not, happen to choose the same nonce, and we process both reports, then the attacker might learn something it's not supposed to.

[This is a little speculative, and I may not have the details pinned down.] To make this more concrete: in Prio3 we evaluate the proof polynomial at a random point derived from the nonce. In addition, we get to learn some intermediate outputs. Normally those intermediate outputs don't leak anything, if the random point is indeed independent across all executions of the FLP. But in this situation, this is not the case.

tgeoghegan commented 2 weeks ago

OK, thanks. While you've collectively talked me off the ledge of killing report IDs, I think we could use a touch more text in VDAF and DAP:

tgeoghegan commented 2 weeks ago

Finally: I notice that section 4.5.1.4 currently lacks explicit instruction that the helper should reject duplicate report IDs, but #554 addresses this by replacing the call to Vdaf.is_valid with

Check if the report has been previously aggregated. If so, the input share MUST be marked as invalid with the error report_replayed.

cjpatton commented 2 weeks ago
* [VDAF {{nonce-requirements}}](https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-vdaf-08#section-9.2) only says clients MUST randomly generate the nonce, but clearly that's not enough. We should add a sentence about aggregators verifying that there are no nonce collisions.

Thinking about this more, I think this is not strictly necessary. See https://github.com/cfrg/draft-irtf-cfrg-vdaf/pull/340#discussion_r1587840456.

tgeoghegan commented 1 week ago

We took https://github.com/cfrg/draft-irtf-cfrg-vdaf/pull/340 and #559 for this. Nothing left to be done here.