Relax requirements around the outer SNI value

chris-wood commented 8 months ago

Closes #572

cc @dennisjackson, @davidben, @martinthomson, @cjpatton, @sftcd

ekr commented 8 months ago

I don't think this quite works. As I understand it, we:

Are silent on what clients have to do, but just tell them "this may not work" if you don't make the names match, but only for the retry mechanism.
We tell the servers that they SHOULD check that the names match.

The result here is that a conformant client can fail to interoperate with a server that behaves as we RECOMMEND. That seems bad.

chris-wood commented 8 months ago

I thought the silence was to be interpreted as a SHOULD, but I would be fine to explicitly say that.

sftcd commented 8 months ago

I'm fine with the change from MUST to SHOULD for the client, and I think calling out the SHOULD for clients as in the most recent commit is better than omitting 2119 terms for that.

The addition of a SHOULD for servers is wrong. (I would prefer a SHOULD NOT for servers but am ok with less.) The ECH protocol works fine without that, (unlike the reasonable SHOULD for clients related to fallback), so I'd at most be ok with a MAY for servers. Noting that servers that do not enforce that that check (the behaviour or all servers I've tested so far) has consequences is fine, but there's no need to add a SHOULD recommending making that check.

chris-wood commented 8 months ago

The addition of a SHOULD for servers is wrong. (I would prefer a SHOULD NOT for servers but am ok with less.) The ECH protocol works fine without that, (unlike the reasonable SHOULD for clients related to fallback), so I'd at most be ok with a MAY for servers. Noting that servers that do not enforce that that check (the behaviour or all servers I've tested so far) has consequences is fine, but there's no need to add a SHOULD recommending making that check.

It's not wrong -- it just doesn't match what you'd like. As has been clearly stated, failure to implement this check admits domain fronting. The normative language is here to discourage that behavior for servers.

sftcd commented 8 months ago

What's "wrong" is certainly debatable. However, while there may be good or bad reasons to (dis)like domain fronting, making this check a SHOULD is not necessary to make the ECH protocol to work. That's (I claim, not that strongly) demonstrated as current servers do not make that check.

chris-wood commented 8 months ago

It's not just about making things work. It's incumbent on us to write the spec in a way that discourages undesirable outcomes.

sftcd commented 8 months ago

OK, so we seem to agree that server-side check is not necessary to make the protocol work.

It's not clear to me we that we on here (or the TLS WG) have rough consensus as to the desirability of encouraging implementations to make domain fronting less or more possible. Deployments can of course validly make such choices, but that's different from a SHOULD that recommends that implementations make that choice for those deploying the protocol.

So overall it seems to me the spec here is better saying that servers MAY make such checks, calling out the consequences, but no more.

chris-wood commented 8 months ago

Implementations can of course make a decision as they see fit, which is precisely why this is phrased in the way it is. That said, I don't have any interest in debating what is the right verb to use here.

sftcd commented 8 months ago

I do have such a verbal interest:-) That;s a 2119 verb, so a SHOULD implies (to me) that servers really ought have code to make that check and send that error, which I think is a less good outcome.

sftcd commented 8 months ago

So here's a different argument against such a SHOULD for the server-side check. Not hugely strong, but worth thinking about.

Say a server supports thousands of public_names but very few ECH private keys. That will create config_id collisions and likely lead to erroneous protocol failures if the server-side SHOULD check we're discussing is enforced, especially if/as public_name values change and servers and clients are or are not up to date with the latest settings.

I'm not claiming that's a winning argument, but it's worth pondering as the (un)desirability of domain fronting may not be the only thing in play here.

ekr commented 8 months ago

Some further thoughts.

I don't find the domain fronting argument for requiring this check on the server to be very persuasive. Why is ECH the right place for the TLS WG to take a position on this, even assuming we have consensus to do so?
ISTM that the stronger rationale for this check is that (1) clients which don't out the correct value in ClientHelloOuter will not interoperate properly in case of the need for fallback (2) fallback will be uncommon and so this is unlikely to be detected in normal operation (3) the check ensures that clients do the right thing (this is the argument of RFC 9413). From this perspective, @sftcd's observation that nothing untoward has happened from existing server's failure to check doesn't tell us much, because it's in the nature of this kind of issue that nonconformance is mostly not a problem until it is, at which point it is hard to repair.
I still don't think that the SHOULD/SHOULD combination works because, as noted before, RFC 2119 allows for implementations to violate SHOULDs and we don't really provide reasons why this might be OK in this case. I think the following combinations are reasonable (I.e., they provide interoperability by conformant implementations):

MUST send and any behavior by the server
MUST send unless you have some out of band reason to know that the server doesn't check and that it won't do fallback, combined with any behavior on the server.
Any behavior by the client combined with MUST ignore on the server

With that said, for the RFC 9413 reasons above, I believe we should require the client to send and encourage the server to check. I could be persuaded to have an exception for clients who have some out of band reason to believe that the server doesn't want it, as in my second option above.

chris-wood commented 8 months ago

With that said, for the RFC 9413 reasons above, I believe we should require the client to send and encourage the server to check. I could be persuaded to have an exception for clients who have some out of band reason to believe that the server doesn't want it, as in my second option above.

The intent of the PR is exactly this. It requires clients to do this, with exceptions (hence the SHOULD), and encourages servers to check (hence the SHOULD). It's possible there are better ways to phrase it, but that's the goal here.

ekr commented 8 months ago

Thanks. In that case, I don't think that the "if you don't do this these are the consequences" works. Rather it actually needs to be a MUST and lay out the specific conditions under which clients can violate that MUST. E.g.,

"It MUST place the value of ECHConfig.contents.public_name in the "server_name" extension unless it has a specific reason to know that the server in question does not implement the retry mechanism described in {{rejected-ech}} and does not enforce the above requirement {{client-facing-server}}."

sftcd commented 8 months ago

"It MUST place the value of ECHConfig.contents.public_name in the "server_name" extension unless it has a specific reason to know that the server in question does not implement the retry mechanism described in {{rejected-ech}} and does not enforce the above requirement {{client-facing-server}}."

I don't think the above quite makes sense, because the client (as a TLS library) won't know things, in particular cannot know what the server may or may not check and, once a correct ECHConfig is used, and ECH decryption has succeeded, the retry scheme doesn't come into play so there's no benefit to the server in that case in making the check.

I'm fine with a SHOULD (or "MUST ... unless") for clients, but the "unless" clause above seems wrong. How would this be?

'Clients SHOULD place the the value of ECHConfig.contents.public_name in the (outer) "server_name" extension unless specifically directed otherwise by application layer code. Clients that do not include that value will not benefit from the retry mechanism described in {{rejected-ech}}, therefore applications need to exercise caution if sending any other value, and e.g. only use this approach with servers for which the application client can be confident a correct ECHConfig value has been used and where the server does not make a check comparing the outer "server_name" to an " ECHConfig.contents.public_name value.'

ekr commented 8 months ago

I don't think the above quite makes sense, because the client (as a TLS library) won't know things, in particular cannot know what the server may or may not check and, once a correct ECHConfig is used, and ECH decryption has succeeded, the retry scheme doesn't come into play so there's no benefit to the server in that case in making the check.

No, I don't think this is correct. The purpose of this mandate is to restrain the behavior of TLS clients on the wire, not the behavior of individual TLS libraries.

sftcd commented 8 months ago

My argument is not only only about libraries. The part of my argument about libraries referred only to the specific wording you proposed. (Maybe github issues aren't the best way to argue such things though, the quoting is hard to do and interpret IMO.)

ekr commented 8 months ago

Well, your proposed wording is about the interaction of the application and the library, and I think that this text should refer solely to the behavior of the client as a unit. And the only normative text is about the library's behavior and my point is that the text should refer to the unit.

sftcd commented 8 months ago

So you'd prefer this (if forced to choose)?

'Clients SHOULD place the the value of ECHConfig.contents.public_name in the (outer) "server_name" extension unless specifically directed otherwise. Clients that do not include that value will not benefit from the retry mechanism described in {{rejected-ech}}, therefore clients need to exercise caution if sending any other value, and e.g. only use this approach with servers for which the client can be confident a correct ECHConfig value has been used and where the server does not make a check comparing the outer "server_name" to an ECHConfig.contents.public_name value.'

I could live with something like that but think it's less useful than the first text I suggested myself.

ekr commented 8 months ago

So you'd prefer this (if forced to choose)?

'Clients SHOULD place the the value of ECHConfig.contents.public_name in the (outer) "server_name" extension unless specifically directed otherwise. Clients that do not include that value will not benefit from the retry mechanism described in {{rejected-ech}}, therefore clients need to exercise caution if sending any other value, and e.g. only use this approach with servers for which the client can be confident a correct ECHConfig value has been used and where the server does not make a check comparing the outer "server_name" to an ECHConfig.contents.public_name value.'

I could live with something like that but think it's less useful than the first text I suggested myself.

As I already provided text, I'm not really sure why I would be forced to choose between this and your prior suggestion.

sftcd commented 8 months ago

On 20/10/2023 21:16, Eric Rescorla wrote:

As I already provided text, I'm not really sure why I would be forced to choose between this and your prior suggestion.

Fair enough. I guess we can see if others chime in and/or take it to the list and/or Prague.

S.

martinthomson commented 8 months ago

I'm OK with the stricter language @ekr suggests. A stack can conform with this requirement if it has an option to disable the outer SNI. A stack can conform with this requirement without implementing the outer SNI if it is clearly stated that it can only be used in cases where servers don't need the outer SNI (for the listed reasons).

sftcd commented 8 months ago

On 23/10/2023 00:49, Martin Thomson wrote:

I'm OK with the stricter language @ekr suggests. A stack can conform with this requirement if it has an option to disable the outer SNI. A stack can conform with this requirement without implementing the outer SNI if it is clearly stated that it can only be used in cases where servers don't need the outer SNI (for the listed reasons).

I don't find my position altered tbh, but in case it helps, that ought not delay production of an updated I-D by the editors. We can argue whatever needs arguing before/during WGLC.

S.

cjpatton commented 8 months ago

If I understand @sftcd's point correctly (please let me know if I'm missing anything), it's that "SHOULD" is better because it permits clients to do something different "as long as they know what they're doing". But I'm worried that a careless reader might ignore the SHOULD and fill in the unspecified behavior in arbitrary ways. This could lead to bugs and/or increase the attack surface for applications built on top of TLS in ways we can't anticipate right now.

I'd also prefer @ekr's stricter language, as a future-proofing measure. Simpler semantics is always better. That said, I actually think this PR is a strict improvement over the current text, since it makes the consequences of ignoring the MUST (or SHOULD) clear.

dennisjackson commented 8 months ago

The current PR for the client reads:

[The Client] SHOULD place the value of ECHConfig.contents.public_name in the "server_name" extension. Clients that do not follow this step, or place a different value in the "server_name" extension, risk breaking the retry mechanism described in {{rejected-ech}} or failing to interoperate with servers that require this step to be done; see {{client-facing-server}}.

I think this is fine. It's very clear as to what the expected behaviour is and what the consequences might be. For the server:

Once the server has chosen the correct ECHConfig, it SHOULD verify that the value in the ClientHelloOuter "server_name" extension matches the value of ECHConfig.contents.public_name, and abort with an "illegal_parameter" alert if these do not match. Failure to enforce this check can allow clients to lie about the ClientHelloOuter SNI, possibly with the intent of bypassing network policies that might otherwise block ECH.

I don't think this text is great for two reasons: 1) It's not clear how to choose the correct ECHConfig, especially if multiple configs use the same keypair and distinct public names (as @sftcd pointed out). 2) I don't think there's going to be any kind of consensus for recommending this check (with a SHOULD) or the rationale for why. Implementers are going to be split between those doing the check and those refusing to.

I see we've already had plenty of suggested changes with little support, so one more can't hurt:

Once the server has chosen the correct ECHConfig, it MAY verify that the value in the ClientHelloOuter "server_name" extension matches the value of ECHConfig.contents.public_name, and abort with an "illegal_parameter" alert if these do not match. This optional check allows the server to limit ECH connections to only use the public SNI values advertised in its ECHConfigs. The server must be careful not to unnecessarily reject connections if the same ECHConfig id or keypair is used in multiple ECHConfigs with distinct public names.

The intent here is to permit the check without taking a stand on whether its desirable or necessary.

sftcd commented 8 months ago

On 26/10/2023 12:23, Dennis Jackson wrote:

Once the server has chosen the correct ECHConfig, it MAY verify that the value in the ClientHelloOuter "server_name" extension matches the value of ECHConfig.contents.public_name, and abort with an "illegal_parameter" alert if these do not match. This optional check allows the server to limit ECH connections to only use the public SNI values advertised in its ECHConfigs. The server must be careful not to unnecessarily reject connections if the same ECHConfig id or keypair is used in multiple ECHConfigs with distinct public names.

I'd be fine with the above. (Maybe with a friendly amendment of s/chosen the correct/successfully decrypted/ or similar.)

And as Dennis said, I think we're all on the same page wrt the client side.

S.

chris-wood commented 8 months ago

Took @dennisjackson's suggestion. I'm fine with this as well.

roelfdutoit commented 8 months ago

I believe this PR is incomplete without adding a corresponding warning to the "8.2 Middleboxes" section. The logic in that section is: Proxy does not understand ECH -> Proxy must act as conforming TLS server -> Proxy forges the disable signal by being authoritative for public name. If outer SNI != public_name then Proxy does not know public name (because it does not understand ECH) and can only be "authoritative" for the outer SNI. At a mimimum the following phrase in that section would have to change from the proxy will act as if connecting to the public name to the proxy will act as if connecting to the ClientHelloOuter server_name, which SHOULD match the public name

taddhar commented 8 months ago

Funny the timing coincidence @roelfdutoit I was about to come back on this section as well for a number of other reasons that for non educated people and non english native people this part is very very hard to parse. Am trying to edit it in my corner since yesterday to find the right words but in short a few paused definitions would help naive readers.

chris-wood commented 8 months ago

Good catch @roelfdutoit! I fixed that text.

ckcr4lyf commented 8 months ago

Is there any strong reason for a client to be SHOULD instead of MAY?

public_name may help the ECH server to determine which config to use, however, it may be that the server has only one config, and doesn't need additional public_name logic to determine which config to use.

I am coming from the angle of censorship-resistance. When SNI was introduced initially, many websites would work without it just fine, however browsers started sending it for any and every website, which lead to leaks where it wasn't even necessary.

I've talked about this on the mailing list: https://mailarchive.ietf.org/arch/msg/tls/HUG1CU0Q4PorZ7fD0yafVfj7VUY/ , and even in the original Github Issue for this PR: https://github.com/tlswg/draft-ietf-tls-esni/issues/572#issuecomment-1780859252

Since I didn't receive replies there and it seems this PR is active, I'd like to raise this point again. If this is not the appropriate place, please let me know where the WG group prefers to have such discussions.

ekr commented 8 months ago

The reason to have it be SHOULD is that, as listed, failure to do so causes compatibility issues.

I am coming from the angle of censorship-resistance. When SNI was introduced initially, many websites would work without it just fine, however browsers started sending it for any and every website, which lead to leaks where it wasn't even necessary.

I'm not following this argument. SNI is necessary except in cases where the required certificate is completely determined by the IP address. Similarly, when ECH is used, the same public SNI will be used but it's just enough to determine the required certificate, which doesn't tell you anything that you didn't know already from the IP address and the DNS records.

ckcr4lyf commented 8 months ago

Similarly, when ECH is used, the same public SNI will be used but it's just enough to determine the required certificate, which doesn't tell you anything that you didn't know already from the IP address and the DNS records.

Assuming you used DoH, then someone in the middle (e.g. an ISP) cannot know which domain you're connecting to. However passing the "public name" SNI in plaintext would reveal this.

In the initial ESNI draft / implementation (prior to ECH), this was not a problem, since ESNI replaces SNI completely (if ESNI was used).

chris-wood commented 8 months ago

Similarly, when ECH is used, the same public SNI will be used but it's just enough to determine the required certificate, which doesn't tell you anything that you didn't know already from the IP address and the DNS records.

Assuming you used DoH, then someone in the middle (e.g. an ISP) cannot know which domain you're connecting to. However passing the "public name" SNI in plaintext would reveal this.

In the initial ESNI draft / implementation (prior to ECH), this was not a problem, since ESNI replaces SNI completely (if ESNI was used).

@ekr's point is that the IP address reveals effectively the same information as the public name.

ckcr4lyf commented 8 months ago

An IP address however can be rotated much more easily (in the context of end users of a service), e.g. for a VPS using IPs owned(?) by a cloud provider, where the actual cloud customer behind it could change from one day to the next. Whereas if I am operating a service, the end-users would expect the domain to be static.

In such scenarios, the ECH public_name would still leak the domain to those in the middle

ckcr4lyf commented 8 months ago

Cloudflare has a blog post talking about something tangential: Identity is attached to names, never addresses

chris-wood commented 8 months ago

Merging after discussion in ECH.

tlswg / draft-ietf-tls-esni

Relax requirements around the outer SNI value #575