ietf-wg-ppm / draft-ietf-ppm-dap

This document describes the Distributed Aggregation Protocol (DAP) being developed by the PPM working group at IETF.
Other
46 stars 22 forks source link

Securely provisioning `vdaf_verify_key` #161

Closed tgeoghegan closed 1 year ago

tgeoghegan commented 3 years ago

The VDAF spec describes how to generate verify_params for aggregators in prio3 and poplar1. The verification parameters MUST be kept secret from other clients, which means that they may need to be rotated either if some aggregator's verification parameter is exposed or if a sufficient number of inputs are processed under some verification parameter (this would take a huge number of inputs to be a problem).

tgeoghegan commented 3 years ago

It may suffice to discuss this abstractly in security considerations and otherwise declare rotation of task parameters to be out-of-band with the PPM protocol, as we already do for initial parameter establishment.

cjpatton commented 2 years ago

I think this is going to need to be specified. If it's done incorrectly, it could become an attack vector.

Both Prio v1 [CGB17] and Poplar [BBCGGI21] require per-report randomness for validating the recovered output shares. For robustness, it is important that the aggregators agree on fresh randomness that is unpredictable to the client. Otherwise it might be possible for a malicious client to garble the aggregate result.

For privacy the picture is murkier. The key question is whether a malicious aggregator can, by controlling the random input the helper consumes, cause the helper to leak information about the measurement. It's possible that the answer depends on the protocol:

For PPM we know we'll need a procedure for establishing a shared secret among the aggregators, which will be used to derive per-report randomness by the VDAF. What's not clear right now is what security properties this procedure needs to have in order for PPM to meet its privacy goals. Either (1) the procedure is irrelevant to privacy, i.e., the share secret may be chosen by the adversary, or (2) we need a protocol that realizes the ideal coin flipping functionality described in [BBCGGI19, Definition 6.3]. Or, perhaps there's something in between.

If (1), then we could just have the leader run the VDAF setup procedure and send the shared secret to the helper over a secure channel. If (2), then we'll have to do something more sophisticated. [BBCGGI19, Section 6.1] points to some papers we can explore. As a straw man, could it be as simple as having each aggregator commit to some initial key material, then derive the shared secret from key material provided by each aggregator? (Something along the lines of https://en.wikipedia.org/wiki/Commitment_scheme#Bit-commitment_in_the_random_oracle_model?)

cc/ @henrycg in case you have thoughts here or can point us in the right direction. cc/ @chris-wood since we spoke about this recently.

henrycg commented 2 years ago

The commit-then-open coin-flipping protocol you mention is pretty simple and is general enough to cover the randomness needs of Prio, Heavy Hitters, and probably any future schemes as well. The only downside is that it adds another round of interaction between the servers.

If you want the leader to be able to unilaterally pick the randomness, let me know and I can check back through the Heavy Hitters paper to make sure that that would be safe in the HH context. I suspect it would be fine, but I would want to check carefully.

cjpatton commented 2 years ago

I think my preference would be to take (2), since it's more conservative. However, it would be great to know if (1) is a possibility. We'd be much obliged if you could take a careful look at the HH paper and see if this will work. While at it, it would be great to know if [BBCGGI19, Theorem 6.6] falls apart if we replace the ideal coin-flipping functionality with the leader choosing the randomness unilaterally.

henrycg commented 2 years ago

Okay, I will add this to my queue. It may take a week or two, but I will follow up once I have taken a closer look.

tgeoghegan commented 2 years ago

Incidentally, I believe ENPA effectively realizes this ideal coin-flipping functionality by having a third party choose the randomness.

In ENPA, a random value is chosen for each report by the ingestion servers that get input shares from clients and batch them up to be forwarded to aggregators.

henrycg commented 2 years ago

I just read back through the Poplar (S&P'21) and BBCGGI19 papers and chatted with my co-authors. The bottom line, barring and bugs in our thinking, is:

Yuval Ishai, a co-author on both of these papers, pointed out that the Fiat-Shamir trick (see Section 6.2.3 of BBCGGI19) applies to all three of these proof systems: Prio v1, Poplar, and BBCGGI19. So, you could potentially use Fiat-Shamir to avoid the need for the leader or anyone else to choose the randomness. The Fiat-Shamir trick only works if you have set up the underlying proof system to have soundness error ~2^{-128}, so if you are using a 64-bit modulus then Fiat-Shamir may not be useful.

cjpatton commented 2 years ago
  • For BBCGGI19, if you are using the proof system of Example 5.6 with the compiler of Section 6.2, then again the leader can safely choose the randomness used to check the proof. (If you use some other base proof system—not the one from Example 5.6 nor the ones like it—then it might not be safe for the leader to choose the randomness.)

I'm fairly certain what we've implemented meets this criterion, but I want to be really careful here, since what we've implemented diverges somewhat from what's in the paper. (@henrycg, you and I spoke over email about the differences.) Let me spell this out so we're all on the same page.

The prio3 VDAF is designed to work with any 1.5-round, public-coin IOP. We refer to this primitive simply as an FLP ("Fully Linear Proof") system in the VDAF spec. Our concrete FLP implementation is based on Theorem 4.3 and includes a couple generalizations. Namely:

Incidentally, while these changes were already suggested by BBCGGI19, they are significant enough to warrant a fresh security proof. Note that this FLP is not yet specified in the VDAF draft -- I plan to do so in the coming weeks. That way we'll have a concrete target for analysis.

Yuval Ishai, a co-author on both of these papers, pointed out that the Fiat-Shamir trick (see Section 6.2.3 of BBCGGI19) applies to all three of these proof systems: Prio v1, Poplar, and BBCGGI19. So, you could potentially use Fiat-Shamir to avoid the need for the leader or anyone else to choose the randomness. The Fiat-Shamir trick only works if you have set up the underlying proof system to have soundness error ~2^{-128}, so if you are using a 64-bit modulus then Fiat-Shamir may not be useful.

In the prio3 VDAF, we already do the Fiat-Shamir trick for the joint randomness in order to avoid interaction between the client and aggregators during the upload phase. It's a nice observation that we could extend this to also encompass the randomness used by the aggregators. (We call this the "query randomness" in the VDAF spec.) However, it may not be worth the extra computational cost for the client, since it doesn't save latency in the aggregation phase.

Incidentally, we have not yet implemented a large enough field: See https://github.com/abetterinternet/libprio-rs/issues/106.

Can you say a bit more about how this would work for Poplar? My understanding is that, since the client doesn't know in advance what candidate prefixes will be used to query its IDPF shares, it has know way of generating the challenge randomness that will be used for the secure sketch.

henrycg commented 2 years ago

Can you say a bit more about how this would work for Poplar? My understanding is that, since the client doesn't know in advance what candidate prefixes will be used to query its IDPF shares, it has know way of generating the challenge randomness that will be used for the secure sketch.

The idea for Poplar (roughly) is that the servers would derive the randomness used in the sketch by hashing their shares of the client's submission. Since the sketch-verification process does not require the servers to interact with the client, I think it's okay that the client does not know at the time of submission exactly which vector the servers will sketch. It is certainly possible that I m missing something here.

I agree that using Fiat-Shamir here may not be worth the extra cost and complexity.

schoppmp commented 2 years ago

@cjpatton and I have been discussing this in cjpatton/vdaf#18, and I think we should review carefully whether it would make sense to rotate verify_params more often. More concretely, I believe it would be good to generate the verification randomness only after a set of client shares to verify has been agreed on, and to re-generate it for every such set of shares.

The reason is that in a real deployment, it might be easier to ensure that the right code is being run (i.e., the aggregation server behaves semi-honestly), than to ensure confidentiality of the aggregation server's state. Now if we fix the verification randomness in advance, an attacker learning it could then create a malicious client share that would pass the verification.

cjpatton commented 2 years ago

I agree that PPM needs to make sure that frequent key rotation for VDAFs needs to be as ergonomic as possible.

branlwyd commented 2 years ago

We should resolve rotation of verify_param for PPM soon, now that we are moving forward with implementation. AFAICT there is currently no specification-compliant way for verify_param to be rotated.

Specifically, given that we are working in a distributed environment, we would need to support multiple versions of verify_params live at once for a single task. There are two basic strategies for rotating secrets in a distributed environment that I have encountered before, but I have concerns with both of them:

1) Include a "verify_param version identifier" value that determines which "version" of the verify_param to use. AFAICT nothing like this is currently specified anywhere in PPM or VDAF. This would be a simple/straightforward solution, but would increase communication costs somewhat & would need to be specified.

2) Try all of the verify_params we know about. However, given the way that verify_param is used as part of prep_init to transform an input share into an initial preparation state, I'm not sure this is feasible: a helper would not know if the verify_param it chose was correct until after the next round of communication with the leader, since "correct" in this case means "the same as the verify_param chosen by the leader". And if the helper chose an incorrect verify_param, I believe this would lead to a transition to an INVALID state in the leader's state machine, so the helper couldn't try another verify_param.

Barring a third option (or an error in analysis of the two options above), I think this needs to be addressed at the specification level.

branlwyd commented 2 years ago

@cjpatton and I have been discussing this in cjpatton/vdaf#18, and I think we should review carefully whether it would make sense to rotate verify_params more often. More concretely, I believe it would be good to generate the verification randomness only after a set of client shares to verify has been agreed on, and to re-generate it for every such set of shares.

There is currently some text in the VDAF spec that suggests that public_param held by the clients needs to match with the verify_param:

CP The public_param is intended to allow for protocols that require the Client to use a public key for sharding its measurement. When rotating the verify_param for such a scheme, it would be necessary to also update the public_param with which the clients are configured. For PPM it would be nice if we could rotate the verify_param without also having to update the clients. We should consider dropping this at some point. See https://github.com/cjpatton/vdaf/issues/19.

I think this is incompatible with the suggestion to generate verify_params after receiving the client report. But, if we want to go this way, perhaps the right answer is to drop the public_param? It's not used by either prio3 or poplar1; and from the perspective of PPM, it would make things easier by allowing rotation of verify_params without coordination with the clients.

cjpatton commented 2 years ago

The spec has changed a lot since we last touched this issue, so I'll add a quick update here: The relevant parameter is the vdaf_verify_key, which is assumed to be exchanged out-of-band. Ideally this parameter can be chosen arbitrarily (even adversarially) without impacting privacy. We (me and @hannahdaviscrypto) are working on a formal analysis that will rule this in or out for Prio3/Poplar1. We'll update this thread when we have more to say here. In any case, for robustness it's important that vdaf_verify_key be a uniform random string unknown to the clients, so it might be useful to spell something here.

branlwyd commented 2 years ago

Ideally this parameter can be chosen arbitrarily (even adversarially) without impacting privacy.

If this turns out to be true (for all VDAFs?), we could consider having the leader generate a value for vdaf_verify_key on a per-aggregation-job basis, treating it as a piece of state that needs to live only as long as the aggregation job is "active". Perhaps the leader would communicate the key to the helper as a new field in AggregateInitializeReq.

This would allow vdaf_verify_key to be removed from the long-lived task parameters, which would be handy as one step towards allowing automated provisioning of tasks. I think it would also solve rotation concerns.

edit: a few additional thoughts:

cjpatton commented 2 years ago

If this turns out to be true (for all VDAFs?), ...

We would have to make this determination on a per-VDAF basis. (A security proof for Prio3/Poplar1 does not necessarily imply anything for an arbitrary VDAF.)

... we could consider having the leader generate a value for vdaf_verify_key on a per-aggregation-job basis, treating it as a piece of state that needs to live only as long as the aggregation job is "active". Perhaps the leader would communicate the key to the helper as a new field in AggregateInitializeReq.

This would allow vdaf_verify_key to be removed from the long-lived task parameters, which would be handy as one step towards allowing automated provisioning of tasks. I think it would also solve rotation concerns.

This would be really nice I think :) However it would require evaluating a candidate VDAF in the same security model as for Prio3/Poplar1. (We should have this model spelled out soon, in fact.)

* This suggestion also assumes that output shares recovered with differing `verify_key` values can be aggregated into a single aggregate result. I think this is true, but can't find an explicit confirmation of this in the VDAF specification. (if not, I suppose a similar approach could be used, but we'd need to specify `verify_key` at a per-batch level instead of per-aggregation-job)

Only the prep phase depends on the verify key, so this shouldn't be a problem.

* We could potentially save a few bytes on-the-wire if we found a way to safely derive the `vdaf_verify_key` from other information already shared between the aggregators, instead of explicitly transmitting the `vdaf_verify_key` from leader to helper.

This direction seems frought with foot-guns to me. 16 additional bytes per aggregation flow is reasonable.

cjpatton commented 2 years ago

Also, @schoppmp pointed out in the security considerations of the VDAF draft that @branlwyd's design would have some security benefit: If the verify key is chosen ONLY AFTER the clients uplooad their reports, then there's no way for this to be leaked ahead of time. (See Phillipp's comment above.)

branlwyd commented 2 years ago

Ah -- reviewing @schoppmp's comments on this issue, I think my proposal matches what Phillipp proposed in their Jan 28 comment:

More concretely, I believe it would be good to generate the verification randomness only after a set of client shares to verify has been agreed on, and to re-generate it for every such set of shares.

Credit to Phillipp if this idea does turn out to be workable. :-)

cjpatton commented 1 year ago

Based on this analysis, it appears to be safe for the Leader to pick the VDAF verification key unilaterally, at least Prio3 and Poplar1. However we have to be somewhat careful: See https://github.com/ietf-wg-ppm/draft-ietf-ppm-dap/issues/407 for the follow up. I'm going to close this issue, since we now have a better idea of what we need.