Specify cross-HRR key rotation issue(s)

chris-wood commented 4 years ago

If a server implements ECH decryption via an RPC call, it's possible for the client's ECH keys to be valid for CH1 but invalid for CH2, perhaps due to a race condition that updates the server's keys across these messages. This should be exceedingly rare in practice, so failing the connection with an alert is probably fine, but we should probably be clear about this.

cc @cjpatton

cjpatton commented 4 years ago

There are two ways a accept-then-HRR-then-reject can happen:

The client sends a config_id after HRR that differs from the config_id it sent before HRR.
The config_id is the same, but the server no longer recognizes it.

The latter is the server's fault, and I think the only thing to do is abort with "internal_error". However, I think requiring that the config_id doesn't change is a good idea. I wonder if this is implied by the client behavior?

If the server sends a HelloRetryRequest in response to the ClientHello, the client sends a second updated ClientHello per the rules in [RFC8446]. However, at this point, the client does not know whether the server processed ClientHelloOuter or ClientHelloInner, and MUST regenerate both values to be acceptable. Note: if the inner and outer ClientHellos use different groups for their key shares or differ in some other way, then the HelloRetryRequest may actually be invalid for one or the other ClientHello, in which case a fresh ClientHello MUST be generated, ignoring the instructions in HelloRetryRequest. Otherwise, the usual rules for HelloRetryRequest processing apply.

In any case, I think we should have the server abort with "illegal_parameter" if the config_id changes over HRR.

davidben commented 4 years ago

Is this any different from, say, the server's cipher suite capabilities changing in between CH1 and CH2 and thus the server not being able to meet what they said with HRR?

I guess I've always assumed that, within the context of a single "handshake" from the perspective of the client, we expect the server to behave self-consistently and, if it can't, I guess the server can internal_error and live with failing one or two connections across server restarts.

cjpatton commented 4 years ago

Is this any different from, say, the server's cipher suite capabilities changing in between CH1 and CH2 and thus the server not being able to meet what they said with HRR?

I think this is a little different, since it can be triggered by changing the set of ECHConfigs the server knows, and not necessarily the set of cipher suites.

I guess I've always assumed that, within the context of a single "handshake" from the perspective of the client, we expect the server to behave self-consistently and, if it can't, I guess the server can internal_error and live with failing one or two connections across server restarts.

I agree. And even though the situation is a bit different, I think "internal_error" is appropriate here as well. However, I also think the client SHOULD NOT change the ECHConfig across HRR. If it does, I think the server SHOULD abort with "illegal_paramter".

Do you think this would be hard for the server to enforce? In particular, a QUIC server might want to handle the HRR path statelessly, and this check requires remembering the previous config_id.

davidben commented 4 years ago

However, I also think the client SHOULD NOT change the ECHConfig across HRR.

I think that's already true. The client can't change things across HRR unless explicitly allowed. (https://tools.ietf.org/html/rfc8446#section-4.1.2)

If it does, I think the server SHOULD abort with "illegal_parameter".

Exactly what the server enforces on HRR is... all over the place. I'm normally in favor of maximal enforcement, but enforcing all the rules of HRR is such a mess. It's especially messy for folks who do stateless HRR, because any state carried over across the two increases the size of the cookie. So I think we need to be a little careful here. Honestly, I think we didn't really get HRR right.

In particular, a QUIC server might want to handle the HRR path statelessly, and this check requires remembering the previous config_id.

QUIC does not care about stateless HRR. It did in the original design, but that's been dropped. DTLS does, though. And I believe NSS always implements it statelessly? (BoringSSL currently only does stateful.)

cjpatton commented 4 years ago

It sounds like a weak "MAY" would be most appropriate.

chris-wood commented 4 years ago

It sounds like a weak "MAY" would be most appropriate.

Can you elaborate?

davidben commented 4 years ago

Looking at this again, I suspect the minimal thing that needs to be checked across CH1 and CH2 is that the server cannot change its mind on whether it used ECH in CH2 and, if using ECH, it needs to have remembered ech_hrr_key (and HPKE cipher suites?) to bind the two. Dunno if it makes sense to just also remember the config_id at that point.

cjpatton commented 4 years ago

, I suspect the minimal thing that needs to be checked across CH1 and CH2 is that the server cannot change its mind on whether it used ECH in CH2 and, if using ECH, it needs to have remembered ech_hrr_key (and HPKE cipher suites?) to bind the two.

I agree, but I don't think tracking the ech_hrr_key is sufficient. Suppose the client sends an acceptable config_id in CH1 but a rejecting config_id in CH2. In this case ech_hrr_key doesn't provide the binding we need, since the client-facing server won't attempt to decrypt the payload in CH2.

A conservative way to ensure the server doesn't change its mind is as follows:

Require that the cipher_suite does not change across HRR. If it does, then abort with "illegal_parameter" alert.**
Require that the config_id does not change across HRR. If it does, then abort with "illegal_parameter" alert.
If config_id isn't recognized after HRR --- say because of key rotation -- then abort with "internal_error" alert.

** I'm not sure 1. is strictly necessary. If the cipher_suite isn't supported by the ECH configuration, then the server would abort with "illegal_parameter". (This is already specified.) Moreover, I don't think we have any agility issues here, given that the derived AEAD key is bound to the cipher suite. However, this does ensure that the server chooses the HPKE cipher suite and KEM key in the same way before and after HRR. This may turn out to be useful.

davidben commented 4 years ago

Suppose the client sends an acceptable config_id in CH1 but a rejecting config_id in CH2. In this case ech_hrr_key doesn't provide the binding we need, since the client-facing server won't attempt to decrypt the payload in CH2.

Right, this is what I mean by "the server cannot change its mind on whether it used ECH in CH2". If you declined ECH in CH1, you ignore ECH in CH2. If you accepted ECH in CH1, failure to handle ECH in CH2 is fatal. But, yeah, if you remember the config used, you can check a bit more accurately, so maybe we should just do that and ask stateless HRRs make an even larger cookie. (Need to be a bit careful about config_id itself because we allow a trial decryption flow too.)

davidben commented 4 years ago

For what it's worth, I'm not a huge fan of CH2 having a fresh HPKE exchange in the first place. That seems an unnecessary asymmetric operation. Were it not for stateless HRR, I think I'd advocate we just make CH2's ECH payload a second pop at the HPKE context's Seal() function (next sequence number), with the rest ignored. But with stateless HRR, we need to serialize the HPKE context into the cookie, which is mildly annoying.

martinthomson commented 4 years ago

I don't see how you can avoid CH2 having a new payload. Unless you are looking to just perform address validation (DTLS), you only send HRR to correct something in the ClientHello.

cjpatton commented 4 years ago

@martinthomson: I don't see how you can avoid CH2 having a new payload. Unless you are looking to just perform address validation (DTLS), you only send HRR to correct something in the ClientHello.

I think what @davidben means is that the AEAD ciphertext could change without changing the encapsulating key. Hence, the ClientHelloInner can be changed without updating the HPKE shared secret.

@davidben: But with stateless HRR, we need to serialize the HPKE context into the cookie, which is mildly annoying.

Wouldn't this leak the HPKE encryption key? The cookie extension is transmitted in the HRR in plaintext, right?

davidben commented 4 years ago

@martinthomson: I don't see how you can avoid CH2 having a new payload. Unless you are looking to just perform address validation (DTLS), you only send HRR to correct something in the ClientHello.

I think what @davidben means is that the AEAD ciphertext could change without changing the encapsulating key. Hence, the ClientHelloInner can be changed without updating the HPKE shared secret.

Specifically, HPKE supports multiple sequential encryption calls on a context with an internal sequence number. https://cfrg.github.io/draft-irtf-cfrg-hpke/draft-irtf-cfrg-hpke.html#name-encryption-and-decryption

@davidben: But with stateless HRR, we need to serialize the HPKE context into the cookie, which is mildly annoying.

Wouldn't this leak the HPKE encryption key? The cookie extension is transmitted in the HRR in plaintext, right?

The cookie already needs to encrypted and authenticated by the server like tickets. It certainly needs to be authenticated independent of ECH, otherwise the attacker can manipulate it anyway. And then the existing ech_hrr_key construction requires encryption too.

(If you didn't encrypt the cookie for some reason and only MAC'd it, this design would allow you to, rather than serializing the HPKE encryption key, remember the expected HPKE config, redo the KEM operation, and bump the sequence number up. But if you're remembering some secret for the MAC, may as well encrypt it...)

cjpatton commented 4 years ago

The cookie already needs to encrypted and authenticated by the server like tickets.

Right, thanks for again clarifying a misunderstanding of mine about the TLS spec :)

Incidentally, I brought up specifying HPKE context serialization here: https://github.com/cfrg/draft-irtf-cfrg-hpke/issues/161. There doesn't seem to be a lot of appetite for it. I suppose we could specify it in ECH, but there are so many use cases for it that I think it would be worth specifying it in the HPKE draft.

cjpatton commented 4 years ago

Given the uncertainty about stateless HRR pointed out in #333, let's permit the client-facing server to abort if ECH usage changes after HRR. This allows stateful-HRR servers to enforce this behavior, but leaves it as an "OPEN ISSUE" about how stateful-hRR servers will enforce this.

I'll prepare a PR tomorrow or Thursday.

tlswg / draft-ietf-tls-esni

Specify cross-HRR key rotation issue(s) #325