Usage indication: alternatives to trial decryption

cjpatton commented 4 years ago

In the current spec, the server provides no indication of whether the inner or outer ClientHello (CH) was used. This means the client must do trial decryption to make this determination, which creates complexity and potentially raises security concerns. As such, it would be useful to explore possible alternatives. In order to drive the discussion, I'll provide a few simple alternatives below, which we can refine as folks provide feedback. (The current spec, draft-07, is listed as option (0) for comparison.)

Besides implementation complexity, one of our design considerations is ensuring that middleboxes don't ossify on ECH. As such, indication of ECH usage should "stick out" (see draft-ietf-tls-sni-encryption, Sec 3.4) as little as possible.

For our purposes, "do not stick out" means a middlebox who observes connections between the client and the client-facing server can't distinguish between real ECH and "dummy" ECH (i.e., a "GREASEd" extension, as described Section 7.4). We assume the middlebox doesn't know the ECH configuration or the public-facing name. (Note that this rules out adversaries such as the GFW, which can actively probe to discover this information.)

Option (0): Do not indicate usage

Protocol flow:

On input of the client's outer CH. If the server accepts ech (i.e., encrypted_client_hello), it uses the inner CH; and if the server rejects or does not support ECH, then it uses the outer CH. It proceeds with the handshake as normal, except that in case of rejection, it sends an ech extension in its EE with the updated ech configuration.
On input of the server's SH, EE, …, Finished. The client determines whether the inner CH or outer CH was used by computing the decryption key for each scenario and attempting to decrypt EE. It then proceeds with the handshake as usual, updating its ech configuration if applicable.

Pros

Sticks out the least.
Is the least complex for servers to implement (same for Option (2)).

Cons

Is the most complex for clients to implement.

Spec changes: None.

Option (1): Publicly indicate acceptance

Protocol flow:

On input of the client's outer CH. If the server accepts ech, it uses the inner CH; and if the server rejects or does not support ech, then it uses the outer CH. If the server accepts, then it adds an empty ech extension to its SH; if the server rejects, then it adds an ech extension to its EE with the updated ech configuration; and If the server doesn't support ech, then it proceeds as normal.
On input of the server's SH, EE, …, Finished. If the SH has the ech extension, then the client proceeds as normal, assuming the inner CH was used; otherwise, the client proceeds as if the outer CH was used, updating its ech configuration if applicable.

Pros

Is the least complex for clients to implement.

Cons

Breaks Split Mode: the backend server must indicate acceptance in its SH but does not know whether the client-facing server accepted or not. (We could ameliorate this problem by adding an indication of acceptance to the inner CH.)
Sticks out the most. (See Option (3).)

Spec changes: Semantics of the ech extension changes; changes are needed to accommodate "Split Mode".

Option (2): Publicly indicate rejection

Protocol flow:

On input of the client's outer CH. If the server accepts ech, it uses the inner CH; and if the server rejects or does not support ech, then it uses the outer CH. If the server accepts or does not support ech, then it proceeds as usual; and if the server rejects, then it adds an ech extension to its SH with the updated ech configuration.
On input of the server's SH, EE, …, Finished. If the SH has the ech extension, then the client proceeds as if the outer CH was used and updates its ech configuration; otherwise, the client proceeds as if the inner CH was used. Decryption failure indicates either that the server does not support ech (i.e., outer CH was used) or the connection is under attack.

Pros

Is the least complex for the server to implement (same as Option (0)).

Cons

Sticks out, but only on rejection.
Complicates deployment: if the client offers ech to a server that has turned off support for the extension, then the connection will fail hard, as the client assumes lack of signal means that ech was accepted. (We could ameliorate this problem, at the cost of added complexity on the client side implementation.)

Spec changes: Semantics of the ech extension changes; ech configuration update is sent in the clear. (We could avoid this by sending the new configuration in a new extension in the EE.)

Option (3): Privately indicate acceptance

It may be worth considering an alternative to Option (1) that doesn't stick out as much. Namely, it's possible to make ech acceptance in the SH indistinguishable from ech rejection.

davidben commented 4 years ago

Even when they are the same entity, I don't think synchronizing the DNS response with the rollback is meaningful. As you note, the OS has a cache. The client may have a cache in front of that. The recursive resolver may have a cache. There are probably other caches in various DNS middleware. There is also no guarantee that the same server instance and will serve the client's DNS query and the client's TLS connection, which means that during the course of a rollout or rollback, there will be mismatches.

chris-wood commented 4 years ago

~~Turtles~~ caches all the way down!

davidben commented 4 years ago

There is also no guarantee that the same server instance and will serve the client's DNS query and the client's TLS connection, which means that during the course of a rollout or rollback, there will be mismatches.

Er, by "mismatch" here, I mean that the client will see DNS and TLS configs from different generations. If a very careful server operator carefully controls changes based on TTLs and deployment times, they may be able to arrange for all observed cross-generation configs to be compatible. (And indeed they should arrange for TLS servers to know about ECH keys before advertising them, etc.)

But this is the careful, happy case, not a failure recovery case.

cjpatton commented 4 years ago

That's a simplifying assumption and doesn't always hold. Even within an enterprise, it's not uncommon for the DNS folks to be a separate group from those running the webservers.

Very true, and I don't think we should assume this is the case. I think (3) is the best option for many deployments, but there are use cases for which (2) is much better, assuming the server can manage the DNS/TLS synchronization complexity. It might be worth exploring a hybrid approach: in its ECHConfig, the server might indicate what it confirms: acceptance a la (3), or rejection al a (2).

The main concern I have with (2) is that the client needs to evict its cache before making the DNS request, and I'm not sure how platform-dependent this behavior is.

davidben commented 4 years ago

The client can't evict the recursive resolver's cache. I'm not sure if even the OSes provide APIs to clear caches. (I don't see an obvious flag to pass into getaddrinfo.)

We also should not have two different spellings of the same thing in the protocol. It is complex enough as it is.

cjpatton commented 4 years ago

So the only way to safely rollback for option (2) is to wait until the DNS record expires. Does anyone have a sense of the degree to which clients respect the record's TTL? Google measured clock skew among Chrome clients years ago and found it was pretty dismal. Is the state of affairs any better today?

In any case, counting on clients to get DNS right appears to be risky. If we go with (2), then it seems the best option on the table so far is to use trial decryption to distinguish between ECH acceptance and (the unlikely case of) ECH rollback.

Let me make one more pitch for (something like) option (2). As @grittygrease pointed out, we have largely ignored a potentially important "don't stick out" consideration. The goal of (3) is to make connections from a real ECH client to an ECH server look like connections from a dummy ECH client (i.e., one that sends a GREASEd extension) to an ECH server. A property that (0,2) have that (1,3) don't is that connections from a real ECH client to an ECH server look like connections from a dummy ECH client to a non-ECH server. In other words, options (1,3) don't provide covertext for non-ECH servers, whereas (0,2) do. (ECH rejection sticks out for (2), but the happy path doesn't.) Do we regard this as a risk to deployment?

davidben commented 4 years ago

In any case, counting on clients to get DNS right appears to be risky. If we go with (2), then it seems the best option on the table so far is to use trial decryption to distinguish between ECH acceptance and (the unlikely case of) ECH rollback.

If we do that, we haven't addressed this issue. If clients still need to implement trial decryption for one case, however unlikely, we're still paying for it and there's no point in building a separate thing. How common a codepath is affects performance considerations, but not complexity considerations. The issue with trial decryption is complexity, not performance. (Trial decryption also breaks some in-place decryption strategies, so there can be a performance concern too, but it's just one record so I'm not concerned about that.)

A property that (0,2) have that (1,3) don't is that connections from a real ECH client to an ECH server look like connections from a dummy ECH client to a non-ECH server. In other words, options (1,3) don't provide covertext for non-ECH servers, whereas (0,2) do. (ECH rejection sticks out for (2), but the happy path doesn't.) Do we regard this as a risk to deployment?

Right, I think this is the ServerHello.random vs. new extension question for (3). Sticking the indicator in ServerHello.random makes the full cross product of {ECH-client, GREASE-client} x {ECH-server, non-ECH-server} look the same, provided the server supports TLS 1.3. This is nice, but it's a weird one-off trick we can't do again. Sticking the indicator in a new extension also makes the same cross product look the same, provided the server supports TLS 1.3 and has been updated to send this extension. It can send this extension independent of ECH support, but it's not a thing anyone does today because the extension doesn't exist, so the deployment curve will be different.

In contrast, (2) is missing coverage. It makes the following tuples look the same: (ECH-client, ECH-server), (ECH-client, non-ECH-server), (GREASE-client, non-ECH-server). It misses (GREASE-client, ECH-server). In particular, clients may be ECH-capable (and thus know to send GREASE extensions) but not configured with a DoH resolver and unable to get HTTPS records over Do53 (either due to cleartext problems or ossification).

cjpatton commented 4 years ago

The issue with trial decryption is complexity, not performance.

Agreed, I'm just reiterating that we haven't solved the problem with (2) if we can't solve the problem with client-side DNS.

grittygrease commented 4 years ago

@davidben, do you expect clients to send GREASE on all connections or only connections for which DoH is available? If you expect clients to send a dummy ECH in situations where the ECHConfig is potentially unavailable, do you expect the server to send ECHConfig back in the handshake and the client to restart the handshake? That seems like a pretty big performance hit.

davidben commented 4 years ago

do you expect clients to send GREASE on all connections or only connections for which DoH is available?

I think they should send it for all connections. That was a big part of the motivation.

If you expect clients to send a dummy ECH in situations where the ECHConfig is potentially unavailable, do you expect the server to send ECHConfig back in the handshake and the client to restart the handshake? That seems like a pretty big performance hit.

No, clients don't process retry configs on GREASE connections.

Offering a GREASE extension is not considered offering an encrypted ClientHello for purposes of requirements in {{client-behavior}}.

Possibly the spec should be clearer here. The intent is that this is a different mode altogether. (Probably the business around sessions remembering whether ECH was negotiated can be dropped too now that we encrypted the whole ClientHello. That was originally added to work around some goofiness between the public and private names. Edit: filed https://github.com/tlswg/draft-ietf-tls-esni/issues/285)

That was an intentional limitation in at least the first iteration of the retry flow. Picking up a retry config without a DNS lookup is odd for several reasons. As you note, there is a performance penalty to the retry. More importantly, the client has already leaked the name at that point. It'd really only be useful for subsequent connections and the text intentionally only applies the retry to one connection attempt. Trying to solve it for subsequent connections would be interesting, but there are several nuisances to resolve:

State is a tracking vector. This is easy enough to address—adjust the scope of the state to meet your privacy goals, same as resumption itself—but we'd need to discuss it and reducing scope also reduces effectiveness.
Remembering TLS-level state, rather than SVCB- or Alt-Svc-level state breaks the multi-CDN story. (At the time this text was written, we hadn't even figured out the multi-CDN story.)
Even if the retry keys contained a full SVCB record, one CDN won't know the config of the other CDN, so it ends up being a CDN pin, which seems awkward.
Any cache for subsequent connections needs to deal with the larger lifetimes necessary for effectiveness (see https://github.com/MikeBishop/dns-alt-svc/issues/105)

Given all that mess, I omitted it from the PR when proposing this mechanism and figured we'd think about these issues later if the WG wanted to pursue a non-DNS flow.

grittygrease commented 4 years ago

The DNS expiration complaint seems like overthinking a bit.

Step 1: Update Registry to remove DS Step 2: Wait until DNS caches expire Step 3: Remove zone keys (KSK, ZSK, RRSIG, etc.)

This is done pretty frequently, and the servers take the risk of the site having an outage if the client has record synchronization issues.

In fact, RRSIG records have explicit expiration times, which makes them less flimsy with respect to expiration. If we follow the lead of RRSIG and add an expiration time to ECHConfig, then we're only relying on clock synchronization during rollover rather than DNS cache expiration.

How about: 1) add a time box to the ECHConfig record 2) recommend only sending GREASE in the same situations as 10.2. describes: when you expect to reliably get the ECHConfig record if it exists (i.e. DNSSEC or DoH)

bemasc commented 4 years ago

In fact, RRSIG records have explicit expiration times, which makes them less flimsy with respect to expiration.

I think this is not quite right. When all RRSIGs in the zone are expired, the status is 'Bogus', not 'Insecure'. In other words, DNSSEC fails hard when the validation expires, and relies on caches to respect TTL. This is a security feature to prevent an attacker from resurrecting expired data. This arguably supports your overall argument, but not your proposed mitigation.

From this discussion, it sounds like trial decryption (Option 0) is only modestly inconvenient for TLS/TCP. If so, that makes me think that we should focus on a simple, separate Option 3 extension only for QUIC, and keep TLS/TCP at Option 0.

cjpatton commented 4 years ago

Hi folks, in order to help drive the discussion, I've created PRs for the options currently being discussed.

283: option (3), but incorporates various changes and improvements.
286: based on #283, but with fallback to option (2). In case option (3) sticks out too much and gets blocked, then we can fall back to option (2) at the cost of deployment complexity.
287: based on #283, but the indication of acceptance appears in the SH.random instead of a new SH extension. This stick out less than (3), but requires security considerations.

From this discussion, it sounds like trial decryption (Option 0) is only modestly inconvenient for TLS/TCP. If so, that makes me think that we should focus on a simple, separate Option 3 extension only for QUIC, and keep TLS/TCP at Option 0.

Based on the discussion on #283, most people seem to not favor supporting this behavior.

cjpatton commented 4 years ago

Hi all, a quick update for those who haven't been following the proposals:

283 adds an explicit confirmation of acceptance as an extension to the SH.
286 is the same as #283, except the server can opt to only explicitly confirm rejection.
287 adds an explicit confirmation of acceptance by hijacking SH.random.

It seems that consensus is coalescing around #287 because it minimizes deployment coimplexity and sticks out less than #283. The open issue for this change is security.

@chris-wood and I reached out to a variety of people who have worked on security proofs of TLS 1.3 to see how this change might impact their analysis. While this change is significant enough to requires generating fresh proofs, no one expects it to lead to an attack if the confirmation string is sufficiently short. The current proposal uses the last 8 bytes of the SH.random, which leaves 24 bytes of entropy to ensure uniqueness of the session id. I added discussion of this point to the PR... it would be helpful to get more eyes on this.

richsalz commented 4 years ago

2 and 3 talk to the same PR?

(edited to remove the email cruft)

cjpatton commented 4 years ago

Oops, fixed!

ekr commented 4 years ago

To follow up on the comment I made at the mic and then decided didn't work.

Assuming we accept PR#292, and decide the CHInner.Random is secret then can we just say that the ESNI accepted signal is to have the low order bytes of SH.Random be derived from CHInner.Random (copied might work, but hashed would make me feel better). I haven't done any real analysis of this, but it seems like it would not permit an attacker who does not know CHInner.Random to determine whether ECH was accepted.

bemasc commented 4 years ago

I think we should consider a construction like Expand(Extract(ServerHello.random[0:24], CHInner.random), "ech-tag", 8), i.e. to make the tag dependent on the rest of ServerHello.random. This would at least partly address @huitema's concern about replays.

cjpatton commented 4 years ago

I don't see the replay concern as important, since all it does is reveal if ECH was used. There are easier ways for an attacker to learn this information. In fact, it doesn't need to interfere with the connection at all: all it needs to do is learn the ECH configuration.

I think it's best to keep the mechanism as simple as possible. In particular, I'd like to do everything we can to not increase the requirements for the backend server in Split Mode.

Of course, if there is an attack that violates the intended security goal of ECH (confidentiality of CH extensions), then we should take that seriously. But I don't think this change (i.e., #287) increases this risk compared to the status quo.

bemasc commented 4 years ago

These attacks aren't part of our core threat model, but it seems like we have an opportunity to defeat some or all of them at low cost, so I think we should consider doing so.

There are easier ways for an attacker to learn this information. In fact, it doesn't need to interfere with the connection at all: all it needs to do is learn the ECH configuration.

This is true in the main deployment models we're discussing, but I can also imagine use cases where the ECHConfig is not available to the attacker.

huitema commented 4 years ago

I think we should consider a construction like Expand(Extract(ServerHello.random[0:24], CHInner.random), "ech-tag", 8), i.e. to make the tag dependent on the rest of ServerHello.random. This would at least partly address @huitema's concern about replays.

That doesn't work. The attacker just needs to replay the entire server random. If you want protection you need to mix in the server's key share.

cjpatton commented 4 years ago

@bemasc

These attacks aren't part of our core threat model, but it seems like we have an opportunity to defeat some or all of them at low cost, so I think we should consider doing so.

If we're going to go down this road, then I think we need to take a step back and think about our "don't stick out" threat model in more detail. Currently our requirement is that a passive observer, who doesn't know the configuration, is unable distinguish real ECH usage from the "cover traffic" provided by clients who "GREASE" the ECH extension. The attackers mentioned so far are active and may know the config. So let's start here: do we anticipate an attacker this powerful? So far we've mostly been talking about "don't stick out" in terms of dumb middleboxes that we don't want ossifying on our extension. The current threat model captures this pretty well, I think. If we want to go for something stronger, then we clearly need to re-think the design of #287 (or decide we shouldn't do it).

Something to keep in mind is that indistinguishability of the "real" protocol from some "cover" protocol is a property that TLS was never designed to have. It seems to me that the task of endowing TLS with some sort of stegonagraphic security property goes way beyond this one extension. It's an interesting and valuable goal, but one that should be addressed in a more general way.

For ECH, I think we should focus our efforts on coming up with a design that we feel we can deploy today, and iterate and re-deploy as needed.

huitema commented 4 years ago

I agree with @cjpatton that there is value in simplicity. A really stealthy ESNI would be a different design than ECH.

chris-wood commented 4 years ago

+1 -- ECH is not about censorship circumvention, or being stealthy.

huitema commented 4 years ago

@chris-wood you should maybe expand a bit on that. If you are not trying to defeat some form of censorship, then why are you hiding the SNI in the first place?

bemasc commented 4 years ago

The problem I'm focusing on is an "attack" so trivial it could almost happen by accident. If a ClientHello is issued twice, verbatim, and elicits two independent ServerHellos, an observer can see whether the last 8 bytes of .Random are the same in both responses. This might happen sporadically to DTLS or QUIC in some configurations, even without an active attacker.

The formula I proposed above avoids this repetition. If we're going to use a hash, as EKR suggested, this calculation seems like a pretty natural way to do it.

I definitely don't want to slow down progress, and I'm not proposing that we substantially expand our threat model. I do think closing trivial attacks has some value even if more advanced ones still exist. For example, the other attacks may be more difficult or less deniable.

chris-wood commented 4 years ago

@chris-wood you should maybe expand a bit on that. If you are not trying to defeat some form of censorship, then why are you hiding the SNI in the first place?

Censorship is, for example, active blocking of a connection based on the name, whereas ECH hides SNI (and other things) from those that just passively snoop and try to learn about clients.

cjpatton commented 4 years ago

EDITED TO FIX PROPOSAL 3.

@bemasc

The problem I'm focusing on is an "attack" so trivial it could almost happen by accident.

I agree that it would be worth mitigating this attack, as long as the mechanism isn't too complicated. Let's consider the "replay protection" properties of the current proposals. Suppose the attacker wants to learn if a client offered ECH, so it replays the ClientHelloOuter to the server. Here are the proposals (please chime in if I got this wrong!)

current proposal: accept_confirmation = getrandom(8)
@ekr proposes: accept_confirmation = Hash(ClientHelloInner.random)
@bemasc proposes: accept_confirmation = Hash(ServerHello.random[0:24] + ClientHelloInner.random)

where Hash is something like Expand(Extract( . , some_salt), some_info, 8). (Though since the ikm is a random string, I think it would suffice to just call Extract( . , some_info, 8).)

Neither 1 nor 2 mitigates the attack, but 3 does. All options "stick out" the same if the ClientHelloInner is known, e.g., if the adversary is on-path from the client-facing server to the backend server.

Incidentally, proposals 2 and 3 are an improvement over 1 since we don't have to send an extension in the ClientHelloInner. On the down side, the backend server needs to know how to instantiate Hash, i.e., it needs to know the HPKE cipher suite. We could get around this by using the hash from the TLS cipher suite.

bemasc commented 4 years ago

@cjpatton In this shorthand, my proposal is more like Hash(ServerHello.random[0:24], ClientHelloInner.random). This avoids the leak that you identified.

cjpatton commented 4 years ago

Ah, you're right. My apologies! Fixing above.

cjpatton commented 4 years ago

I'd be fine with 2 or 3, though we should use the TLS cipher suite instead of the HPKE cipher suite so that the backend server doesn't need to know the latter.

huitema commented 4 years ago

@bemasc proposes accept_confirmation = Hash(ServerHello.random[0:24] + ClientHelloInner.random) My proposal would be: accept_confirmation = Hash(ServerHello.KeyShare + ClientHelloInner.random))

The rationale is that merely hashing the reminder of the server random is insufficient. The attacker could just do the attack I delineated in issue #287 by copying the whole ServerHello.random[0:32] instead of just copying ServerHello.random[24:32]. But if you mix the server key share in the hash, then the attacker cannot do that without also copying a key share for which the private key is unknown.

cjpatton commented 4 years ago

EDITED AFTER DISCUSSION WITH @chris-wood

Roger that. Here's what we have on the table:

current: accept_confirmation = getrandom(8)
@ekr: accept_confirmation = PRF(ClientHelloInner.random, "")
@bemasc: accept_confirmation = PRF(ClientHelloInner.random, ServerHello.random[0:24])
@huitema: accept_confirmation = PRF(ClientHelloInner.random, ServerHello.KeyShare)

Let's instantiate PRF( . , . ) with Expand( . , . , 8), where Expand is for the TLS cipher suite (and not HPKE). Proposal 1 and 2 are for the status-quo threat model, i.e., the "don't stick out" distinguisher is passive; proposal 3 provides additional "don't stick out" protection in case the CH is replayed; and proposal 4 improves on 3 by providing some protection against manipulation of the SH.

My preference is proposal 2, since it simplifies the extension. I would be fine with 3 or 4, though I'm not convinced that either fully addresses the stronger threat model.

cjpatton commented 4 years ago

Hmm, on second thought I'm not so sure how much simpler 2 is than 1. The ClientHelloInner would still have to carry some sort of indication of ECH acceptance so that the backend server knows to confirm. But an empty "encrypted_client_hello" extension (or maybe a new code point?) would do just fine.

Something weird about 4 is that the backend server has to wait to finish the ServerHello.random until it generates a key share. This might add a bit of complexity, though it depends on the code base.

huitema commented 4 years ago

@cjpatton Yes, incorporating the key share is more complex. But let's look at what we are doing, replacing trial decryption by a hint. Trial decryption generates complexity, especially in the QUIC mapping, but the result is unambiguous and hard to fool. The client knows for sure whether the key was generated from the inner CH or the outer CH, and it is very hard for third parties to partially fool the client. The hint introduces another failure mode, i.e. wrong hint value, and I believe it can be exploited. For protection, the code has to be almost as hard to fool as trial decryption. That's what I am trying to achieve by incorporating the server key share in the mix.

There are of course implementation issues. The server has to know what key share it will use before generating Server.Random. That may or may not be easy to do, depending on implementation. The KeyShareEntry value do not depend on the Server.Random value, so this is definitely possible. But the code path depends on the implementation, and it may be more difficult for some stacks than for others.

cjpatton commented 4 years ago

HI all, I added a commit to #287 that implements @bemasc's suggestion. Specifically, accept_confirmation (i.e., the last 8 bytes of ServerHello.random is computed as

    HKDF-Expand-Label(
        HKDF-Extract(0, ClientHello.random),
        "tls13-ech-accept-confirm",
        ServerHello.random[0:24],
        8
    )

where HKDF-Extract and HKDF-Expand-Label are as defined in RFC8446. Doing Extract-then-Expand ensures that we don't run into any issues with the length of the ClientHello.random not matching the Hash.length in the TLS stack.

Please have a look to make sure it's spelled correctly.

bemasc commented 4 years ago

HKDF-Expand-Label adds a "tls13 " prefix to the label, so I think you can shorten the label.

I agree, we need HKDF-Extract() for Hash.length > 32 (e.g. SHA-512). Given the need for HKDF-Extract(), it would seem more natural to me to put ServerHello.random[0:24] in the extraction salt, and use HKDF-Expand instead of HKDF-Expand-Label.

cjpatton commented 4 years ago

@bemasc

HKDF-Expand-Label adds a "tls13 " prefix to the label, so I think you can shorten the label.

Good call! Fixing. This reminds me that we need to do a pass of the spec to ensure all the constants have the same structure.

Given the need for HKDF-Extract(), it would seem more natural to me to put ServerHello.random[0:24] in the extraction salt, ...

I disagree. In any case, the salt being Hash.length bytes long avoids indifferentiability issues [1].

... and use HKDF-Expand instead of HKDF-Expand-Label.

What does this buy us?

[1] https://ieeexplore.ieee.org/document/8806752

bemasc commented 4 years ago

I'm not familiar with that paper, but Section 4.3 seems to say that HKDF is suitably indifferentiable without any such restriction on the salt length.

Using HKDF-Expand instead of HKDF-Expand-Label would seem to make use of fewer, better-analyzed constructions, but I'm not aware of a practical difference, so if HKDF-Expand-Label is more convenient to implement for some reason then that seems like enough justification.

cjpatton commented 4 years ago

I'm not familiar with that paper, but Section 4.3 seems to say that HKDF is suitably indifferentiable without any such restriction on the salt length.

There are many "safe" salt lengths. I'm not sure 24 is "safe", but I know Hash.length is.

Using HKDF-Expand instead of HKDF-Expand-Label would seem to make use of fewer, better-analyzed constructions, but I'm not aware of a practical difference, so if HKDF-Expand-Label is more convenient to implement for some reason then that seems like enough justification.

I don't think one is any harder than the other. The only difference between them is that HKDF-Expand-Label exposes an additional context parameter, which I think aligns a bit better with what we're doing here.

If you'd like to keep pushing for these changes, then please follow up by making a comment on the PR.

huitema commented 4 years ago

I appreciate the safety concerns, but you are going to extract an 8 bytes hint from the hash. That's a serious step down from 32 or even 16 bytes, and with such a short length I would be really surprised if two different hash constructs resulted in any security difference!

cjpatton commented 4 years ago

Hahaha, yeah. We need the 8 bytes to be pseudorandom, and I think the current design is defensible from a provable security perspective. We may be able to do a bit better. What do you think of this, @bemasc?

    accept_confirmation = HKDF-Extract(ServerHello.random[0:24] + 0^{Hash.len-24}, ClientHello.random)[0:8]

This is valid as long as Hash.len >= 24, which I believe is guaranteed by RFC8446.

bemasc commented 4 years ago

That's fine with me, although I'm not sure why you need to pad the salt. (HKDF-Extract will pad it for you.)

cjpatton commented 4 years ago

(HKDF-Extract will pad it for you.)

Roger that.

Are you happy with this @huitema?

cjpatton commented 4 years ago

(HKDF-Extract will pad it for you.)

Hmm, looking at RFC5869, it's not clear to me that the salt is padded by this function. I think I prefer the following:

     accept_confirmation = HKDF-Extract(0, ClientHelloInner.random + ServerHello.random[0:24])[0:8]

cjpatton commented 4 years ago

Updated #287 with this change.

huitema commented 4 years ago

I think that's a fine implementation of the suggestion made by @bemasc . I am waiting for the resolution of the "don't stick out" issue on the TLS mailing list.

cjpatton commented 4 years ago

The decision in today's interim meeting is to merge #287 as-is and reconsider the "don't stick out" threat model later on. In particular, we won't be adopting Karthik's suggestion from the mailing list for this PR. @ekr also pointed out that it could be done as an ECH extension.

chris-wood commented 4 years ago

Closing now that #287 landed.

tlswg / draft-ietf-tls-esni