"Don't fallback to non-ECH" option

tlswg / draft-ietf-tls-esni

TLS Encrypted Client Hello

https://tlswg.github.io/draft-ietf-tls-esni/#go.draft-ietf-tls-esni.html

Other

230 stars 56 forks source link

"Don't fallback to non-ECH" option #547

Closed TheBlueMatt closed 9 months ago

TheBlueMatt commented 1 year ago

Given we've now gone to all the effort to pipe a public key through from DNS to TLS clients and can use that to securely communicate with a server, it seems like an oversight to not have a "dont fallback to not using this cryptographic key" option. For clients which support it, this would prevent CA compromise from enabling MITM attacks by requiring both a DNS-"authenticated" (at least insofar as DNS cache poisoning is not also done, or otherwise prevented using DNSSEC) key, plus a CA-authenticated key.

ekr commented 1 year ago

See https://datatracker.ietf.org/doc/rfc6698/

TheBlueMatt commented 1 year ago

Yes I'm well aware of DANE lol, sadly it's used nowhere outside of email. ECH, on the other hand, is likely to be adopted in ~every web browser, and (a) at the cost of one bit and an extra if statement we can substantially improve security of the web under certain attack models (compare to DANE, which requires substantial additional implementation complexity) and (b) since when has "there's already some other protocol for that, we can't have any overlapping feature sets" been an IETF stance :p.

ekr commented 1 year ago

I don't really understand the semantics of what you're proposing here. Can you lay out your proposal in more detail.

TheBlueMatt commented 1 year ago

Sure, of course, and apologies if I've missed anything in how ECH works as currently defined.

If an ECH negotiation succeeds, the client knows that the inner client hello was decrypted with the key it fetched from DNS (I believe? At least its trivial to accomplish as long as the client MUST use sufficiently cryptographically distinct info in its inner/outer client hellos that the transcript hash is sufficiently distinct) - ultimately successfully authenticated the server against both the (optionally) CA-signed certificate, and the ECH key fetched from the DNS. This has given the client two "roots of trust" for the authentication of their connection!

However, currently, a server can reject ECH in a number of ways (negotiating TLS < 1.3, using ClientHelloOuter, etc). In such a case we fall back to only the CA as the root of trust. The simplest approach to allowing a server to opt in to the dual root of trust required world would be to add a bit in the HTTPS RR which indicates that a client which supports ECH MUST return a failure to the calling application if the server rejects ECH (after all retry_configs have been exhausted, if any).

This does end up with a similar result as DANE (where if the server tries to authenticate with a cert other than the specified one, failure occurs). However, it has a number of advantages over DANE - (a) obviously the implementation path I mentioned above, (b) it doesn't require checking the DNSSEC AD flag on the client, which was a major sticking point for DANE integration in browsers (these days with DoH this could be revisited given checking the AD flag is trivial now, but I'm not holding my breath), (c) doesn't require DNSSEC on the domain (which was also a major sticking point for DANE adoption, though things are shifting glacially for email). As described in 10.2 of the current ECH draft-15 while DNSSEC is an additional defense, worry over this bit being "misused" to break handshakes via poising or so isn't a big concern.

Alternatively, the server could be allowed to reject ECH with proof that it has decrypted the inner client hello, but that's likely substantially more complexity especially at this later stage of the design.

dennisjackson commented 1 year ago

As far as I understand the proposal, you'd like to add a 1-bit field to ECHConfig which if set will require clients to fail with an error if they can't connect with ECH. If not set, clients follow the current behaviour of falling back to non-ECH if the server securely disables ECH, otherwise they fail with an error.

I don't agree with your analysis. If you're concerned with a CA MiTM, you can't rely on the DoH channel being secure but you're also explicitly saying clients don't need to check DNSSEC. Without at least checking DNSSEC, you aren't actually getting a second root of trust, but you are risking significant breakage from misconfiguration which ECH is designed to avoid.

If you want to mitigate CA MiTM, I think you need DANE (and DNSSEC) and we shouldn't try to glue a DANE-like mode on to ECH.

TheBlueMatt commented 1 year ago

I don't agree with your analysis. If you're concerned with a CA MiTM, you can't rely on the DoH channel being secure but you're also explicitly saying clients don't need to check DNSSEC.

I wasn't assuming a DoH channel to some third party DoH provider? If my DNS is local that isn't an issue. Further, if my DNS is my ISP and the server gets BGP hijacked or their ISP is compromised, adding a second root of trust outside CAs also adds value. There are, of course, many many potential attacks which this doesn't help with, but there are absolutely real setups where this provides a meaningful difference.

Without at least checking DNSSEC, you aren't actually getting a second root of trust

Sure you are, just not necessarily a particularly secure one! Luckily ~every DNS resolver software written in the past 20 years validates DNSSEC by default, so as long as your connection to your resolver is secure (which for many people it is, within reason, though definitely not all!) you've got a pretty reasonable second root of trust.

but you are risking significant breakage from misconfiguration which ECH is designed to avoid.

Which issues specifically are you concerned about? Presumably the largest issue here would be setting the bit and forgetting it while rotating the ECH key, so I'll assume you meant that.

Luckily, the retry_configs already offer good key rotation support, so its very possible to do this safely. In general, having this be opt-in largely addresses it - if you're not someone who can do careful key management (eg because this stuff isn't automated between the web server/dns server boundary), you simply won't use it. Naming it something scary also usually helps :).

While DNSSEC suffers from a generally much worse version of this issue (losing your keys or rotating them leaves your zone inoperable, but generally with a much, much, much higher TTL than most modern A/AAAA records for TLS servers), and yet we see incredibly few issues of this sort. This is in part because the idea of this kind of zone-bricking is scary, preventing lots of adoption, but also it causes deployment to generally be very justifiably cautious. With reasonable communication of the potential outcome, engineers managing this stuff get very careful :)

If you want to mitigate CA MiTM, I think you need DANE (and DNSSEC) and we shouldn't try to glue a DANE-like mode on to ECH.

If web browsers were chomping at the bit to implement DNSSEC verification (or at least check the AD flag), check DANE records, and website admins were chomping at the bit to opt into DNSSEC, I'd probably agree with you. But for many reasons none of those are true, so for the purposes of this discussion I think we'd be much more realistic by assuming DANE didn't exist.

dennisjackson commented 1 year ago

but you are risking significant breakage from misconfiguration which ECH is designed to avoid.

Which issues specifically are you concerned about? Presumably the largest issue here would be setting the bit and forgetting it while rotating the ECH key, so I'll assume you meant that.

I'm concerned about clients in an enterprise or educational network which performs TLS MiTM. Right now, ECH will gracefully degrade for them as they'll get a secure fallback signal. This is critical to the adoption of ECH.

With your change, these networks will need to prevent clients from receiving an EchConfig with the no-fallback bit set otherwise the website will break. Note that in this case there is no security improvement for your proposal because these devices have already had a root installed by the user or administrator.

If you want to mitigate CA MiTM, I think you need DANE (and DNSSEC) and we shouldn't try to glue a DANE-like mode on to ECH.

If web browsers were chomping at the bit to implement DNSSEC verification (or at least check the AD flag), check DANE records, and website admins were chomping at the bit to opt into DNSSEC, I'd probably agree with you. But for many reasons none of those are true, so for the purposes of this discussion I think we'd be much more realistic by assuming DANE didn't exist.

Your position seems to be simultaneously that DNSSEC is widely used and that no one uses DNSSEC. If we're agreeing that website admins are reluctant to opt in to DNSSEC (let alone DANE), then why do you think opting in to noFallback ECH is going to be likely?

ekr commented 1 year ago

For some of the same brittleness reasons as Dennis, I'm skeptical of this idea. More importantly, however, this seems like something that's orthogonal to the purpose of ECH, so I don't think we should make it part of this spec.

Fortunately, ECH supports extensions so it would be straightforward to define an extension for this.

TheBlueMatt commented 1 year ago

I'm concerned about clients in an enterprise or educational network which performs TLS MiTM. Right now, ECH will gracefully degrade for them as they'll get a secure fallback signal. This is critical to the adoption of ECH.

Didn't we have this whole debate with TLS 1.3 to begin with? I understood the conclusion there to be a resounding "we don't degrade the security of the web for enterprises that want to do MiTM"? Why is that different here?

With your change, these networks will need to prevent clients from receiving an EchConfig with the no-fallback bit set otherwise the website will break. Note that in this case there is no security improvement for your proposal because these devices have already had a root installed by the user or administrator.

Indeed, that seems totally fine? In an enterprise environment where TLS MiTM is required, whatever existing scheme for MiTM they use which provides plaintext/decryption keys will continue to work fine with or without ECH. If management only wants to see the SNI field for monitoring, it would be rather trivial to simply unset any ECH-required bit at the DNS level or simply drop the HTTPS RR responses entirely, which such systems seem likely to do with or without an ECH-required bit.

Your position seems to be simultaneously that DNSSEC is widely used and that no one uses DNSSEC. If we're agreeing that website admins are reluctant to opt in to DNSSEC (let alone DANE), then why do you think opting in to noFallback ECH is going to be likely?

DNSSEC validation is widely deployed (since it is default in ~every DNS resolver made in the past 20 years). DNSSEC signing of domains is relatively rare, for many reasons (including fear over bricking domains, outdated views of the security of DNSSEC, etc), though seeing some moderate adoption in the email world.

DANE validation in the browser world has ~0 adoption, for many reasons (the last time we tried DNSSEC did have weaker keys than folks would like, the above DNSSEC-signing fears preventing adoption, system resolvers not exposing the AD flag or TLSA records, trusting ISP DNS resolverss to provide authentication keys, etc, etc). With DoH becoming more popular, the AD flag carries even less meaning than it used to (now you're explicitly trusting a third-party rather than some server presumably operated by even your ISP), so I'm highly doubtful this will change, and not for no reason.

I'd hope we don't disagree on any of that?

Many of those issues do not apply to any noFallback ECH - while its conceptually similar DANE (at least when still requiring a CA-signed trusted cert), the avoidance of DNSSEC here provides for a rather huge potential difference in adoption - instead of being forced to opt your entire zone into a potentially-bricking scheme where you have no control over the TTL and you have a whole new set of key material to worry about, this reuses key material you were already going to be adding, allows you to opt in only at the single hostname level, and avoids any of the legacy DNSSEC key insecurity fears, not to mention the debate around DNSSEC operational concerns.

More importantly, however, this seems like something that's orthogonal to the purpose of ECH, so I don't think we should make it part of this spec.

In the general "the goal of ECH is to encrypt the client hello" view, I'd say that's fair. But in the more practical sense that we have now added a DNS-provided key exchange, it seems entirely within the purpose. Not to mention given the simplicity of actually implementing it seems more than worth it? It provides us an opportunity to not just improve privacy on the web with SNI encryption, but also a genuine substantial security improvement with an extra bit and a few conditionals - that seems like it should be more than worth it!

Fortunately, ECH supports extensions so it would be straightforward to define an extension for this.

Sadly I think its pretty clear the adoption of such an extension would be a substantial uphill battle. Instead of this security improvement being there on day one, now we have to go through a process of convincing browsers to add an extension, define it as a spec, get that through the IETF process (which in practice some/most? TLS vendors would want to see), etc, etc. Sadly I don't have the substantial time commitment available for such a thing, and given this conversation it sounds like no one else is going to do it :)

dennisjackson commented 1 year ago

I'm concerned about clients in an enterprise or educational network which performs TLS MiTM. Right now, ECH will gracefully degrade for them as they'll get a secure fallback signal. This is critical to the adoption of ECH.

Didn't we have this whole debate with TLS 1.3 to begin with? I understood the conclusion there to be a resounding "we don't degrade the security of the web for enterprises that want to do MiTM"? Why is that different here?

TLS1.3 gracefully falls back to TLS1.2 (with correctly implemented devices). Similarly, ECH gracefully falls back to non-ECH. In both cases the fallback mechanism is authenticated the same way (to a valid cert for the domain).

With your change, these networks will need to prevent clients from receiving an EchConfig with the no-fallback bit set otherwise the website will break. Note that in this case there is no security improvement for your proposal because these devices have already had a root installed by the user or administrator.

Indeed, that seems totally fine? In an enterprise environment where TLS MiTM is required, whatever existing scheme for MiTM they use which provides plaintext/decryption keys will continue to work fine with or without ECH.

It won't work though, because their MiTM with a locally installed root on a server that doesn't support ECH looks just like a CA MiTM to the client. Your noFallback bit will cause these connections to fail.

Many of those issues do not apply to any noFallback ECH - while its conceptually similar DANE (at least when still requiring a CA-signed trusted cert), the avoidance of DNSSEC here provides for a rather huge potential difference in adoption [...]

My main concern here is the breakage I described above, which I don't see any way to address.

TheBlueMatt commented 1 year ago

It won't work though, because their MiTM with a locally installed root on a server that doesn't support ECH looks just like a CA MiTM to the client. Your noFallback bit will cause these connections to fail. My main concern here is the breakage I described above, which I don't see any way to address.

"Clients MAY ignore the noFallback bit if the certificate authority which is ultimately trusted in the certificate path provided by the server was installed through administrator intervention." :)

(Okay I know it's not that simple and a pretty awkward carve-out, but on most platforms, or at least in most browser contexts, you can determine if a CA was system-default or not, and it does solve this case. If the goal is narrowly the MITM-CA case, this seems like a reasonable tradeoff.)

Alternatively, clients could simply ignore the bit if any modification to the system CA store has been made.

TLS1.3 gracefully falls back to TLS1.2 (with correctly implemented devices). Similarly, ECH gracefully falls back to non-ECH. In both cases the fallback mechanism is authenticated the same way (to a valid cert for the domain).

Apologies for the confusion, my comment was in reference to the more general philosophy here of not allowing middlebox upgrade concerns to dominate security decisions and not the specific fallback argument.

Can you more clearly define the fallback design goal? While clients & servers may fall back from TLS1.3 to TLS1.2, they certainly aren't required to. Increasingly few sites support TLS prior to 1.2, and 1.3-required isn't all that wild to see on the open internet.

Any monitoring systems built which relied on null cipher options are increasingly breaking if they intended to work with general TLS, or in fact any MITM appliances which intended to work with all real-world TLS traffic which don't yet support TLS 1.3.

Backwards compatibility on the web is obviously absolutely critical, including through already deployed middleboxes. However, we also move forward. Real-world applications drop support for old protocols - there are several popular TLS libraries that only support TLS 1.3. If middleboxes expect to MITM TLS and work with general internet TLS traffic, over the course of protocol adoption lifetimes they have to adapt. Nothing any IETF WG says or does can change that.

Luckily, in this case, adaptation is trivial. If the client is using DoH the same appliance MITMing all TLS is already MITMing DNS, and if not DNS likely the simplest internet protocol to MITM (if they aren't already). Alternatively, any client with software which installed a CA can also trivially disable the enforcement of such a noFallback option - if browser or other TLS client vendors are concerned about adoption with this feature, allowing it to be disabled should address that.

More generally, your argument applies to any change to browser or other client activity which changes to enforce new rules or otherwise behaves differently at all. If such concerns were absolute DANE would be entirely off the table, as would browsers moving automatically to DoH, or even ECH itself. After all, there are many monitoring middleboxes (including on my network!) which read SNI data and can react to it, not to mention that nation state firewalls blocked ESNI when it originally shipped (they may not this time around thanks to GREASE, but there's no guarantee!). I hope we aren't in a world where we can't have nice things :).

As browsers did with the automated DoH rollout, checking if enforcement breaks connectivity to a sentinel host would allow for seamless fallback on networks on which it results in connection failure. It wouldn't be unreasonable to tell clients they SHOULD check for such breakage prior to enforcing a new noFallback option.

dennisjackson commented 1 year ago

"Clients MAY ignore the noFallback bit if the certificate authority which is ultimately trusted in the certificate path provided by the server was installed through administrator intervention." :)

(Okay I know it's not that simple and a pretty awkward carve-out, but on most platforms, or at least in most browser contexts, you can determine if a CA was system-default or not, and it does solve this case. If the goal is narrowly the MITM-CA case, this seems like a reasonable tradeoff.)

Unfortunately some of those installed CAs are also the ones we're most concerned by. E.g. the current situation with the Russian Domestic CA. Definitely something this fallback bit should prevent, but it appears just like a user installed CA and so with your proposal would not be protected against...

TLS1.3 gracefully falls back to TLS1.2 (with correctly implemented devices). Similarly, ECH gracefully falls back to non-ECH. In both cases the fallback mechanism is authenticated the same way (to a valid cert for the domain).

Apologies for the confusion, my comment was in reference to the more general philosophy here of not allowing middlebox upgrade concerns to dominate security decisions and not the specific fallback argument.

Can you more clearly define the fallback design goal? While clients & servers may fall back from TLS1.3 to TLS1.2, they certainly aren't required to. Increasingly few sites support TLS prior to 1.2, and 1.3-required isn't all that wild to see on the open internet.

Whomever owns a valid TLS certificate for the domain (validity as determined by the client), controls the fallback behaviour. Whether that's to non-ECH or non-TLS1.3 or whatever. This is the property that TLS and its extensions ensure.

Luckily, in this case, adaptation is trivial. If the client is using DoH the same appliance MITMing all TLS is already MITMing DNS, and if not DNS likely the simplest internet protocol to MITM (if they aren't already). Alternatively, any client with software which installed a CA can also trivially disable the enforcement of such a noFallback option - if browser or other TLS client vendors are concerned about adoption with this feature, allowing it to be disabled should address that.

As I stated, the competence of the IT teams operating this middlebox infrastructure is usually extremely low. I'm doubtful they can reliably filter DNS and based on the error reports users send to us.... getting them to either install CAs into Firefox's specific root store or flip the Firefox pref to use the OS store is already difficult enough.

More generally, your argument applies to any change to browser or other client activity which changes to enforce new rules or otherwise behaves differently at all. If such concerns were absolute DANE would be entirely off the table, as would browsers moving automatically to DoH, or even ECH itself. After all, there are many monitoring middleboxes (including on my network!) which read SNI data and can react to it, not to mention that nation state firewalls blocked ESNI when it originally shipped (they may not this time around thanks to GREASE, but there's no guarantee!). I hope we aren't in a world where we can't have nice things :).

Moving to DoH and ECH doesn't break the invariant I mentioned above. Moving to DANE and/or this proposal does. It also delivers very marginal gains, considering the rarity of CA MiTM in practice.

As browsers did with the automated DoH rollout, checking if enforcement breaks connectivity to a sentinel host would allow for seamless fallback on networks on which it results in connection failure. It wouldn't be unreasonable to tell clients they SHOULD check for such breakage prior to enforcing a new noFallback option.

An attacker capable of CA MiTM would break connectivity to the sentinel host as a prelude to starting their attack...

TheBlueMatt commented 1 year ago

Unfortunately some of those installed CAs are also the ones we're most concerned by. E.g. the current situation with the Russian Domestic CA. Definitely something this fallback bit should prevent, but it appears just like a user installed CA and so with your proposal would not be protected against...

An attacker capable of CA MiTM would break connectivity to the sentinel host as a prelude to starting their attack...

We can't have it both ways - as I pointed out in my earlier response any improvement in the security model of TLS will result in breakage in any case where someone installs a malicious/MiTMing root CA. If you hold the view that we must not break such a thing, then you can avoid breaking them. If you hold the view that we should break such things, then you can break them. I don't think that has anything to do with this proposal, but is rather a fundamental tradeoff in any security improvement to TLS.

I sincerely hope we're not in a world where we can't have nice things :)

Can you more clearly define the fallback design goal? While clients & servers may fall back from TLS1.3 to TLS1.2, they certainly aren't required to. Increasingly few sites support TLS prior to 1.2, and 1.3-required isn't all that wild to see on the open internet.

Whomever owns a valid TLS certificate for the domain (validity as determined by the client), controls the fallback behaviour. Whether that's to non-ECH or non-TLS1.3 or whatever. This is the property that TLS and its extensions ensure.

Heh, I meant describing the why of the goal in specifics, what can/should/mustnot/etc be impacted by any change.

As I stated, the competence of the IT teams operating this middlebox infrastructure is usually extremely low. I'm doubtful they can reliably filter DNS and based on the error reports users send to us.... getting them to either install CAs into Firefox's specific root store or flip the Firefox pref to use the OS store is already difficult enough.

Then they've already been broken by TLS 1.2, and soon 1.3 (there's discussion on deprecating TLS 1.2 now!) because they didn't support those when they installed the machine, and will be equally broken by any change to TLS which is eventually required, or any improvement to the TLS security model. I'd really like to better understand the concrete goal here, because "don't break deployed things" isn't sufficiently detailed.

chris-wood commented 9 months ago

Thanks for the discussion here, folks! Given that this is being proposed as an option, I think the best course of action is to write up this capability as an ECH extension in a separate draft. We can then discuss the semantics of the option separately. I'm going to close this issue with that as the recommended next step.