zcash / zips

Zcash Improvement Proposals
https://zips.z.cash
MIT License
274 stars 156 forks source link

Use HMAC for KDF and hSig #25

Closed defuse closed 8 years ago

defuse commented 8 years ago

We want to prove that KDF and the way h_sig is computed are both PRFs seeded on dhsecret and randomSeed, respectively. If we move randomSeed to the same place as dhsecret is in the computation of KDF then one proof can cover both functions.

defuse commented 8 years ago

Actually, I think we should just use HMAC. The current construction requires an assumption about the SHA256 compression function I don't want to rely on. See https://github.com/zcash/zcash/issues/792#issuecomment-197490103.

daira commented 8 years ago

Remember that this is computed in the circuit, so we want to minimise the number of compression function evaluations.

Never mind, these two are outside the circuit.

defuse commented 8 years ago

I'm probably missing something but I don't see where they're used in the circuit. hsig depends only on things that are public when I'm verifying a proof, so I can compute hsig outside and provide it as an input to the circuit. I can't remember if KDF was used in the circuit for selective transparency stuff or not, I didn't think it was.

daira commented 8 years ago

+1 for moving the position of randomSeed.

defuse commented 8 years ago

As an alternative to switching to HMAC (which I think we should do anyway because reviewed implementations and test vectors are available) we could maybe see if the proofs in http://cseweb.ucsd.edu/~mihir/papers/hmac-new.html could be used as inspiration to prove our current construction secure.

daira commented 8 years ago

As far as I know, the hash for hSig only needs to be collision-resistant (which implies second preimage-resistant). I'm not opposed to using HMAC but I'm not convinced it's necessary.

[Edit – moved from a comment below for clarity:] Yep, there's no need for hSig to be pseudorandom (it's public and never used as a secret, it just needs to be unique between transactions and to identify a particular pourPubKey). So the hash used to produce it does not need to be a PRF; it only needs to be collision-resistant — and collision resistance is in fact required to prevent the Faerie Gold attack.

Edit: For the KDF, http://cseweb.ucsd.edu/~mihir/papers/hmac-new.html doesn't provide an argument for using HMAC rather than the plain hash with a single block input, because its proof is of PRFness of HMAC assuming PRFness of the compression function (which also trivially implies PRFness of the full hash for a single block).

zookozcash commented 8 years ago

An even more conservative choice for KDF would be HKDF. On Mar 16, 2016 14:29, "Daira Hopwood" notifications@github.com wrote:

As far as I know, the hash for hSig only needs to be collision-resistant (which implies second preimage-resistant), and the coin commitment function only needs to be a hiding and binding commitment scheme. I'm not opposed to using HMAC but I'm not convinced it's necessary.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/zcash/zips/issues/25#issuecomment-197556893

daira commented 8 years ago

I believe HKDF is really just unnecessary complexity in this context. It's designed for situations where you need to be able to produce an arbitrary amount of keying material; here we just need 256 bits (the size of an AEAD_CHACHA20_POLY1305 key).

In particular, note that AEAD_CHACHA20_POLY1305 effectively already includes the "expansion" part of the extract-then-expand paradigm employed in the design of HKDF: it uses ChaCha20 for that. (The fact that it directly uses part of the expansion as a keystream rather than to derive another key is a harmless optimisation that is rigorously justified in the AEAD_CHACHA20_POLY1305 security proof.) So that part of the HKDF design is redundant here and cannot possibly help. In fact it could hurt if it were done badly, although I've no reason to believe that is the case.

daira commented 8 years ago

What might potentially help is the random salt input to the extractor. (Note however that we already include the ephemeral public key, which has ~251 bits of entropy, as an input to the KDF; the question here is whether it is useful to have an extra salt that is independent of the DH inputs. I'm reading more papers now that may shed light on this.)

daira commented 8 years ago

Adding a salt could also hurt because it's giving a chosen-ciphertext adversary control over part of the KDF input that does not affect the dhsecret. (If such an adversary tries to alter the ephemeral public key or the recipient public key in the ciphertext to an encoding of a point that is not equivalent –i.e. not the same when multiplied by the cofactor– they will change the dhsecret at the same time, and there are only a small number of pairs of equivalent point encodings for the two public keys.)

daira commented 8 years ago

If the KDF is collision resistant [edit: it only needs to be second-preimage resistant for this argument] then a chosen-ciphertext adversary can't force two different KDF inputs to produce the same ChaCha20 key, and so in practice the Poly1305 authenticator will fail for any modified ciphertext. This is a difficult to use in a security proof, though, because a function that is secure as a PRF (the cipher in this case) can have equivalent keys. There's no reason that I know of to believe that ChaCha20 has equivalent keys, but if it did, and if SHA-256 had a kind of partial collision in its output between those equivalent keys for a partly-unknown input (that would be an impressive attack!) then there could be a failure of authentication security. So we need some kind of "pseudo-random output" property for the KDF unless we want to make a stronger-than-PRF assumption about ChaCha20.

(If we used HKDF, the same difficulty would just apply to the internal PRF* component rather than to ChaCha20.)

matthewdgreen commented 8 years ago

I think in practice raw SHA is fine, but there are proofs about HMAC as an extractor under the assumption that the compression function is pseudorandom. I assume there are similar proofs of HKDF.

Alternatively, NIST defines a set of recommended KDFs. I'm not saying that any of these are particularly better, but this seems like a problem that's been solved pretty well -- no?

Matt

On Thu, Mar 17, 2016 at 8:21 AM, Daira Hopwood notifications@github.com wrote:

If the KDF is collision resistant then a chosen-ciphertext adversary can't force two different KDF inputs to produce the same ChaCha20 key, and so in practice the Poly1305 authenticator will fail. This is a difficult to use in a security proof, though, because a function that is secure as a PRF can have equivalent keys. There's no reason that I know of to believe that ChaCha20 has equivalent keys, but if it did, and if SHA-256 had a kind of partial collision in its output between those equivalent keys for a partly-unknown input (that would be an impressive attack!) then there could be a failure of authentication security. So we need some kind of PRF-like property for the KDF unless we want to make a stronger assumption about ChaCha20.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/zcash/zips/issues/25#issuecomment-197854518

daira commented 8 years ago

There being proofs about HMAC as an extractor seems like a good reason to use it (depending on the details of the proofs, of course). I found these papers:

daira commented 8 years ago

I propose using 256 bits of the output of SHA-512(tag(i) || trunc246(hSig) || dhsecret || epk || pknewenc,i).

Rationale:

It would also be possible to use half of the SHA-512 output as a ChaCha20 key and half as the Poly1305 key, rather than taking the Poly1305 key from the ChaCha20 keystream. (Note that you can model this as using two different PRF families, so it can't reduce security.) I'm not sure whether to do this or to stick with the standard AEAD, but that's a secondary decision.

daira commented 8 years ago

So, as far as I can see, the results in those papers are mainly about overcoming difficulties introduced by the definition of HMAC (motivated purely by the fact that it is commonly used for this purpose in existing protocols) to show that the result is still as secure as having used a single hash with a large enough input block. They don't provide any evidence that HMAC actually helps, relative to the large-enough hash.

ebfull commented 8 years ago

I don't understand all of the tradeoffs involved here, but I would like us to hesitate designing a scheme which depends on hSig's integrity for any kind of security property. In general, whatever scheme we come up with should be secure in isolation so that it's difficult to misuse.

hSig is yet another thing we would have to place in an "encrypted note" (such as the kind we return from our RPC after zcrawpour) so that it can be decrypted without access to the transaction, in addition to the nonce and ephemeral key. Perhaps the design of our RPC needs to be revisited?

daira commented 8 years ago

This scheme doesn't depend on hSig's integrity. It just benefits from hSig's integrity to improve the security bound. The KDF can still be analysed independently from the rest of the protocol as an extractor that takes a seed/salt input, for which we happen to use hSig. The construction I suggested would still be secure even if, for the sake of argument, an attacker could choose that seed. (It would achieve a weaker security bound, but no worse than the existing proposal that doesn't use a seed, and no worse than HMAC.)

daira commented 8 years ago

Also, hSig, epk, and i should be considered part of the ciphertext. It's already the case that epk and i must be considered part of the ciphertext, so I don't see the harm in adding hSig.

daira commented 8 years ago

Here's a relevant paper: Computational Extractors and Pseudorandomness — in particular section 7 "Practical Computational Extractors from Weak PRF". It basically proves the security of my suggested extractor under a very reasonable assumption about SHA-512. [Edit: we used BLAKE2b-256, but the same proof applies.]

zookozcash commented 8 years ago

If we're going to add SHA-512, could we instead use BLAKE2b for that purpose?

Because BLAKE2b has even better assurance of security than SHA-512 (which already has very high assurance of security for this purpose): https://github.com/zcash/zcash/issues/706#issuecomment-187807410, and because I want to migrate to BLAKE2 exclusively: https://github.com/zcash/zcash/issues/706#issuecomment-187870587

daira commented 8 years ago

There is no security reason why we couldn't use BLAKE2b (it has the same block input length as SHA-512), but there is already an implementation of SHA-512 in Bitcoin Core, whereas we'd need to add an implementation of BLAKE2b. It also seems slightly odd to have a protocol that uses SHA-2-family hashes everywhere but in one place.

As for migrating to BLAKE2 exclusively, see https://github.com/zcash/zcash/issues/706#issuecomment-188273057 . Edit: In particular, if we use SHA-256 for the coin commitments and address derivation in 1.0 (as we've already decided), I am fairly certain that we will never change that — we won't want to invalidate all existing coins and addresses, and we won't want to pay the cost in the circuit of supporting two hashes, unless SHA-256 were to be unexpectedly and significantly broken.

However, there's nothing particularly wrong from a security point of view about using BLAKE2b for just the KDF; that argument is only a matter of aesthetics.

daira commented 8 years ago

On the subject of HMAC vs a plain hash:

When HMAC is used with a key input greater than the input block size of the hash, the key is first hashed. This would be the case for any input that includes dhsecret, epk, and pknewenc,i. Therefore, HMAC cannot possibly preserve more entropy than the hash alone. Nor does it do any better job at extracting the entropy into a uniformly random output — either the hash alone is sufficient to do that (as suggested by the result in section 7 of 'Computational Extractors and Pseudorandomness'), or there must be some flaw in the hash as a weak-PRF that would also affect its use in HMAC.

daira commented 8 years ago

I wrote:

There is no security reason why we couldn't use BLAKE2b (it has the same block input length as SHA-512), but there is already an implementation of SHA-512 in Bitcoin Core, whereas we'd need to add an implementation of BLAKE2b. It also seems slightly odd to have a protocol that uses SHA-2-family hashes everywhere but in one place.

On the other hand, using BLAKE2b would avoid the need to truncate hSig. (You'd put i in the personalisation field, and then everything still fits in one block, because BLAKE2b does not require padding.) [Edit: this is what we actually ended up doing.]

zookozcash commented 8 years ago

Daira: I want you to choose this. Either way, let's be careful not to conflate "SHA-512" with "the SHA-512 compression function" and likewise for SHA-256.

daira commented 8 years ago

I choose SHA-512 (the full hash, but there is only one input block). https://github.com/zcash/zips/blob/zips25.change-kdf.0/protocol/protocol.pdf

[Edit: we ended up switching to BLAKE2b.]

defuse commented 8 years ago

@daira wrote (in a comment above):

Edit: For the KDF, http://cseweb.ucsd.edu/~mihir/papers/hmac-new.html doesn't provide an argument for using HMAC rather than the plain hash with a single block input, because its proof is of PRFness of HMAC assuming PRFness of the compression function (which also trivially implies PRFness of the full hash for a single block).

That's right, but the computation of hSig isn't a plain hash with a single block input and neither is KDF, which hash with a single block input was that in reference to? [Edit: Oh, with SHA512 it all fits in one block?]

@daira also wrote:

[Edit – moved from a comment below for clarity:] Yep, there's no need for hSig to be pseudorandom (it's public and never used as a secret, it just needs to be unique between transactions and to identify a particular pourPubKey). So the hash used to produce it does not need to be a PRF; it only needs to be collision-resistant — and collision resistance is in fact required to prevent the Faerie Gold attack.

But now we're talking about using hSig like a salt, so now does it (or its first 256 bits) need to be pseudorandom?

defuse commented 8 years ago

@daira wrote:

I choose SHA-512 (the full hash, but there is only one input block). https://github.com/zcash/zips/blob/zips25.change-kdf.0/protocol/protocol.pdf

Clarification: All of our protocol's stuff is in the first block, but there'll be two calls to the compression function because padding fills an additional block.

defuse commented 8 years ago

I still like the idea of using HMAC for both because then we only need to write and test one standardized function (that has published test vectors), and our protocol wouldn't look as crazy like the Telegram protocol with a wire coming off of hSig to lots of other things. It could be a PR issue if someone looks at our protocol and says "What the heck is all of this crap for?" They may not immediately understand that half of it isn't really necessary for security and it's just extra.

But as long as we document our choices then I'm totally fine with using SHA512 for the KDF.

Are we leaving the computation of hSig the same, or do we need to make it something (provably) pseudorandom to get the stronger benefits of including it in the KDF?

defuse commented 8 years ago

Actually I like SHA512 more than HMAC now. SHA512 has test vectors too ;)

We are reusing hSig for a lot of things, though. We should make sure they don't interact.

daira commented 8 years ago

I intended the input to fit in one SHA-512 block, but neglected to account for the 128-bit length field in the SHA-512 padding. That's ok, I'll change it to use the compression function instead, or reconsider whether to use Blake2b (which would allow to use the full hash with no truncation).

daira commented 8 years ago

@defuse wrote:

But now we're talking about using hSig like a salt, so now does it (or its first 256 bits) need to be pseudorandom?

That's a good point. Technically it does. This is in practice fine because a variation of the same argument we make for the commitment applies also to the hash used to compute hSig.

Edit: actually it doesn't need to be pseudorandom. The proof in section 7 of https://eprint.iacr.org/2011/708 models the seed as being the input, not the key, to the keyed hash.

daira commented 8 years ago

We have now switched to BLAKE2b for the KDF, and are not going to use HMAC. I believe this can be justified with a rigorous security proof, so I'm closing this ticket. The computation of hSig, for which we only rely on collision resistance, also uses BLAKE2b. (As it happens, that input also fits into a single BLAKE2b block, but that wasn't essential.)

daira commented 8 years ago

A subtle point: the section 7 proof assumes m ≥ n, where n is the size of the key to the wPRF, and m is the output size. Here m is 256 bits, and the relevant n is the size of the DH secret which is 256 bits for Curve25519 (it does not have full entropy; that's fine), even though the KDF has other (public) inputs.