A first proposal to fix the no-sni section.

taddhar commented 10 months ago

In this part of the text there would be a lot to say.

First for non-English native speakers who are not familiar with the topic and in particular in operations or even to some degree in system architecture or network security domains, the text will be very difficult to read.

In fact think about someone in a Small and Medium Enterprises organization that has to ensure the security of its organization it will be close to impossible to connect the dots. How many dozen of millions of SMEs on the planet?

Anyway, it is VERY hard to be more explicit in the context of this draft though I would be tempted to add examples and be more explicit on sub-contexts and on use cases e.g. put in hard BYOD will fail. but anyway this links to OUR Internet Draft on Deployment Considerations that suddenly makes a lot of sense as an expansion of this section from 1 page to 20+.

One could argue that there are many types of Middleboxes and they all will be affected by that (DPI, Firewalls, Proxies, etc.)

However there is ONE section which is simply 1) wrong and 2) completely irrelevant is the last section which I removed. The only alternative would be to make a significant fix in that section and even if I fix it, it will add nothing to the fact that when a number of devices won't be able to access the SNI, they will break, period.

So I am happy to rework a section but the section will say that whilst in general the SNI is unreliable, in practice there are essential sub-context where is is not only VERY reliable but as well is a very valuable information for in-path actors who have many ways to check and validate and handle the unreliable part.

ekr commented 10 months ago

So I am happy to rework a section but the section will say that whilst in general the SNI is unreliable, in practice there are essential sub-context where is is not only VERY reliable but as well is a very valuable information for in-path actors who have many ways to check and validate and handle the unreliable part.

You're going to have to explain more about why you think this is the case. Validating the certificate is insufficient. You need to ensure that the site has the corresponding private key, which you can't do passively with TLS 1.2 in non-PFS mode.

ekr commented 10 months ago

I appreciate that people use the SNI the way you indicate. But the point of this text is that this practice is not secure if the endpoint is compromised.

roelfdutoit commented 10 months ago

Network security middleboxes utilize various techniques to improve the reliability of the control in an enterprise environment, inter alia:

DNS hostname lookup of SNI and optional destination redirect or failure on mismatch.
TLS "probe" sessions to validate not only the certificate and SAN entries, but also the ability of the destination to complete the TLS handshake.

ekr commented 10 months ago

I don't understand what (1) means. Re (2) this is not reliable because the attacker can simply forward the TLS connection to the true endpoint when being probed and it's straightforward to arrange that the probe connection doesn't look like a connection from their own compromised clients.

roelfdutoit commented 10 months ago

I don't understand what (1) means

dstip_client = destination as observed on client side of middlebox dstip_sni = gethostbyname(SNI) Option 1: if (dstip_sni != dstip_client) then reset connection. Option 2: dstip = dstip_sni, i.e. redirect the connection to dstip_sni, where "redirect" is implementation specific.

roelfdutoit commented 10 months ago

(2) this is not reliable because the attacker can simply forward the TLS connection to the true endpoint

Sure, I agree, but do they really? Implementing (2) forces the attacker to implement this relay, and when combined with (1) gives the middlebox a mechanism to detect an anomaly where the true endpoint dstip seems to have "jumped".

ekr commented 10 months ago

I don't understand what (1) means

dstip_client = destination as observed on client side of middlebox dstip_sni = gethostbyname(SNI) Option 1: if (dstip_sni != dstip_client) then reset connection. Option 2: dstip = dstip_sni, i.e. redirect the connection to dstip_sni, where "redirect" is implementation specific.

Wait, you're talking about comparing the IP address? It's incredibly common to get different IPs for the same DNS name, so (1) is a non-starter (2) is really just like fighting with the load balancing on the server, so is bad news.

roelfdutoit commented 10 months ago

Wait, you're talking about comparing the IP address? It's incredibly common to get different IPs for the same DNS name, so (1) is a non-starter (2) is really just like fighting with the load balancing on the server, so is bad news.

The context of my comments is "enterprise environment". Calling (1) a non-starter ignores the level of control an enterprise has over DNS (including DoH requests) and gethostbyname in my pseudo-code is not referring to a simple lookup.

As for (2).. the attacker tries to bypass the middlebox by using www.example.org as SNI on a TLS session to c2.attacker.xxx. The middlebox will resolve www.example.org and connect upstream to www.example.org, not to c2.attacker.xxx.

sftcd commented 10 months ago

On 15/11/2023 19:48, Roelof DuToit wrote:

The context of my comments is "enterprise environment".

IMO any disagreement about the goals of this specification needs to be raised on the TLS WG list and (if it to have any effect) not as an issue on github. And in particular, one ought note the TLS WG and IETF more broadly have already reached consensus on RFC8744 so attempts to circumvent that consensus via github issues/PRs should and will eventually be rejected on at least that basis.

S.

ekr commented 10 months ago

Wait, you're talking about comparing the IP address? It's incredibly common to get different IPs for the same DNS name, so (1) is a non-starter (2) is really just like fighting with the load balancing on the server, so is bad news.

The context of my comments is "enterprise environment". Calling (1) a non-starter ignores the level of control an enterprise has over DNS (including DoH requests) and gethostbyname in my pseudo-code is not referring to a simple lookup.

As for (2).. the attacker tries to bypass the middlebox by using www.example.org as SNI on a TLS session to c2.attacker.xxx. The middlebox will resolve www.example.org and connect upstream to www.example.org, not to c2.attacker.xxx.

The relevant case here is one where the endpoint is doing its own DNS resolution invisibly to the enterprise, because otherwise the enterprise can just strip the ECHConfigs.

roelfdutoit commented 10 months ago

The relevant case here is one where the endpoint is doing its own DNS resolution invisibly to the enterprise, because otherwise the enterprise can just strip the ECHConfigs.

Understood, but I thought we were discussing the reliabillity of SNI as control irrespective of ECH as stated in the text this PR is trying to modify: "use as a control even in the absence of ECH".

In an enterprise environment where the client device is compromised the network security device will use the techniques I have described above to improve the reliability of SNI as a control. I understand that the attacker can also improve its chances of not being detected, but it has to move up the Pyramid of Pain - just spoofing the SNI is an unreliable way to bypass advanced network security devices.

taddhar commented 10 months ago

There is no disagreement on the goal of this draft, the disagreement is that the clause we are debating is wrong but more importantly it is irrelevant to this draft, so we are not even trying to fix it, this will confuse anyone and adds nothing to the benefit of ECH.

sftcd commented 10 months ago

On 16/11/2023 07:14, Arnaud Taddei wrote:

There is no disagreement on the goal of this draft

Great! I was more reacting to the discussion on the PR than the PR text itself btw.

S.

ekr commented 7 months ago

OK, given that this is not in security considerations and (1) the text here makes a claim which is disputed and (2) the effect of removing the text is to make the impact of ECH seem bigger than it did in the previous text which seems to be erring on the safe side, I'm going to merge this PR unless someone objects by 2/24.

tlswg / draft-ietf-tls-esni

A first proposal to fix the no-sni section. #594