Cross-aggregator data consistency

chris-wood commented 3 years ago

We assume that all aggregators process the share of a client's input, but what happens if one aggregator loses its share of the client input? How do we accommodate one aggregator losing its view of a share, either maliciously or accidentally?

cjpatton commented 3 years ago

BCC+19 consider two settings for ZKP systems on distributed data: one in which either the client or a subset of aggregators is malicious (this is the setting in which Prio was analyzed); and another in which the client may collude with a subset of aggregators. This issue corresponds to the latter setting.

Security in this setting is addressed in Section 6.3. The basic idea seems to be to run the input-validation protocol with every subset of servers.

tgeoghegan commented 3 years ago

We addressed this problem in Prio v2 by having each aggregator evaluate proofs independently. So during intake, when an aggregator receives a data share, it extracts the proof share and transmits it to the other aggregator. At aggregation time, an aggregator will not include a data share in a sum unless it can assemble the triple of (data share, own proof share, peer proof share) and verify the proof. If it can't do that because the peer proof is unavailable, the aggregator drops the share from the sum, but still sums everything else.

In our design, the leader could be responsible for this. Clients would send encrypted data shares and proof shares to aggregators, and aggregators would then have to extract the proof share and send it to the leader. The leader could then work out the intersection of the sets of proof shares provided by all aggregators (which is the set of data that can be summed in the aggregation) and then only instruct aggregators to sum that set of shares.

cjpatton commented 3 years ago

We addressed this problem in Prio v2 by having each aggregator evaluate proofs independently. So during intake, when an aggregator receives a data share, it extracts the proof share and transmits it to the other aggregator. At aggregation time, an aggregator will not include a data share in a sum unless it can assemble the triple of (data share, own proof share, peer proof share) and verify the proof. If it can't do that because the peer proof is unavailable, the aggregator drops the share from the sum, but still sums everything else.

Hmm, having all of the proof shares is a potential privacy issue, no? If I have all of the proof shares, then I can assemble the entire proof polynomial, which gives me the output of each intermediate G-gate evaluation and not just the final output. (Perhaps you mean "verification share"? See https://github.com/abetterinternet/prio-documents/pull/16#discussion_r600081599.)

In our design, the leader could be responsible for this. Clients would send encrypted data shares and proof shares to aggregators, and aggregators would then have to extract the proof share and send it to the leader. The leader could then work out the intersection of the sets of proof shares provided by all aggregators (which is the set of data that can be summed in the aggregation) and then only instruct aggregators to sum that set of shares.

This sounds like a good idea.

cjpatton commented 3 years ago

As of 2021/7/14, our answer is as follows: all aggregators need to be online for the duration of the protocol. In order to recover in case an aggregator drops, the collector could spin up multiple tasks, each with different sets of aggregators. We briefly considered formalizing this in the protocol, but decided this was too complex. (See #68.) Another protocol-level option is, potentially, threshold secret sharing (#22).

ietf-wg-ppm / draft-ietf-ppm-dap

Cross-aggregator data consistency #4