tlswg / draft-ietf-tls-esni

TLS Encrypted Client Hello
https://tlswg.github.io/draft-ietf-tls-esni/#go.draft-ietf-tls-esni.html
Other
231 stars 57 forks source link

Prohibit IP addresses in ECHConfig.public_name. #436

Closed chris-wood closed 3 years ago

chris-wood commented 3 years ago

This is yet another option to deal with public_name encoding, reference identity, and ClientHelloOuter construction issues.

Closes #396, #405, #424

cc @martinthomson, @davidben, @cjpatton, @sftcd

sftcd commented 3 years ago

Hiya,

On 26/05/2021 20:48, Christopher Wood wrote:

Sadly, no :( The format admits IP addresses, but the following sentence says they should be ignored in those cases. I forget why we wanted to even consider all this TBH:-)

I'm not sure if I'd want to write code to enforce those rules on the client or not TBH.

One nit: FQDNs can (and implicitly do) end in a dot. I bet the current text would annoy some DNS personages and might even break something e.g. if the string were copied from (some library that reads in) a zone file where that trailing dot is significant when present, but often absent. (I dunno if a library that'd make that bug likely exists but I can easily imagine the code that'd cause a dot at the end of a public_name.)

Cheers, S.

chris-wood commented 3 years ago

I forget why we wanted to even consider all this TBH:-)

Do you mean consider dropping IP address support? The rationale was that encoding rules for addresses here is overly complicated, and most folks should be able to get a certificate with a name.

I'm not sure if I'd want to write code to enforce those rules on the client or not TBH.

By rules, do you mean "ignore if IP address"?

Regarding the FQDN trailing dot, yeah, I figure we'd just assume that's implicit. We could also change the text to say the trailing dot is allowed. I don't feel strongly about that. I'm just trying to find a path forward.

sftcd commented 3 years ago

On 26/05/2021 21:28, Christopher Wood wrote:

I forget why we wanted to even consider all this TBH:-)

Do you mean consider dropping IP address support?

I just meant I literally forget why we need any IP address support at all within ECHConfigs.

The rationale was that encoding rules for addresses here is overly complicated, and most folks should be able to get a certificate with a name.

Yep.

I'm not sure if I'd want to write code to enforce those rules on the client or not TBH.

By rules, do you mean "ignore if IP address"?

So it might be ok to delve into the public_name and do such checks whilst encoding or decoding an ECHConfig but I'd not wanna include that code into the OpenSSL extension handling.

Regarding the FQDN trailing dot, yeah, I figure we'd just assume that's implicit. We could also change the text to say the trailing dot is allowed. I don't feel strongly about that. I'm just trying to find a path forward.

Sure, it was a nit, but one that might be worth handling. (I didn't check the whatwg thing though, so maybe that handles it too.)

Cheers, S.

chris-wood commented 3 years ago

So it might be ok to delve into the public_name and do such checks whilst encoding or decoding an ECHConfig but I'd not wanna include that code into the OpenSSL extension handling.

Would this be part of the extension handling? I imagine it would be underneath some API where you pass ECHConfig from elsewhere (DNS) to the TLS stack.

In any case, isn't some sort of processing required for ECHConfig? If it's not a DNS name, what would you do? I would expect OpenSSL, like any major stack, to implement the ECH rejection path using public_name, absent an application-specific knob to disable that. Am I misunderstanding you?

sftcd commented 3 years ago

On 27/05/2021 02:23, Christopher Wood wrote:

So it might be ok to delve into the public_name and do such checks whilst encoding or decoding an ECHConfig but I'd not wanna include that code into the OpenSSL extension handling.

Would this be part of the extension handling? I imagine it would be underneath some API where you pass ECHConfig from elsewhere (DNS) to the TLS stack.

Right, that's what I was trying to say:-)

In any case, isn't some sort of processing required for ECHConfig? If it's not a DNS name, what would you do? I would expect OpenSSL, like any major stack, to implement the ECH rejection path using public_name, absent an application-specific knob to disable that. Am I misunderstanding you?

I'd need to look at the code but from memory the library defaults to just providing a function to let the application say to enforce the check (e.g. to dive into the SANs and see if a string-form name matches some SAN). I've not stumbled over any detailed/strict DNS name syntax checks that I recall e.g. checking for <=63 octet label names or whatever else is needed, though there may be such code already in order to support DANE. Would have to look and see TBH - but adding DNS name syntax checks that are only for ECH public_name wouldn't be so attractive if it didn't do the same as was already done elsewhere in the library or by a bunch of applications using the library (that latter being v. hard to know of course).

Cheers, S.

davidben commented 3 years ago

(For the record, 0x1.0x2.0x3.0x4 is a domain name always; the x makes it so. IP addresses are always digits. The real trick is in realizing that 127.1 and 2130706433 and 017700000001 are all localhost.)

Is it? The WHATWG parser seems to happily accept hex digits. Empirically, Chrome, Firefox, and Safari interpret http://0x7f000001:8080/ as http://127.0.0.1:8080. I'm not seeing anything in WHATWG that says IPv4 literals are always digits. RFC3986 does, but RFC3986 is much more restrictive than what the web uses. https://url.spec.whatwg.org/#ipv4-number-parser

martinthomson commented 3 years ago

Yeah, I misread that. I just went ahead and implemented this in rust, just to see what the algorithm might need to look like. It's a gigantic mess. Another lesson in how software engineering practices of the 1980s should be left there.

But you can all see how it worked out: https://github.com/martinthomson/url-ipv4

davidben commented 3 years ago

Yeah, that gigantic mess, combined with ambiguity as to IP syntaxes (we have at least two so far), is why I was inclined to go the Somebody Else's Problem route. :-) But I can also live with a well-defined IP literal syntax to reject. Though, in order to be useful for browsers, that syntax must be the messy WHATWG one or a superset.

It's a lot more complexity for ECH (lots of people now need to reimplement the WHATWG parser), but I agree it does avoid reasoning about DNSName -> DNSOrIP type transitions downstream... provided your DNSOrIP type is compatible with the WHATWG parser. DNSOrIP<WHATWG> and DNSOrIP<RFC3986> are fine. If your random protocol's DNSOrIP<BrandNewVariant> is even more permissive than WHATWG's IP parser, you're out of luck and need the SEP provisions.

I don't think the SEP provisions are quite that dire. Anyone transiting a DNSOrIP through their X.509 code has to be aware of the syntax somewhere because X.509 uses different name types for DNS vs. IP. Whatever you use to resolve that is the right parser to use when going from ECH's DNSName to your DNSOrIP type. And, hopefully, at that transition, the IP parser is readily available so we don't have duplicate code. Another thing to keep in mind is the public name doesn't need to interact with the entire system, just the TLS to certificate verification path. ClientHelloOuter handshakes are discarded.

That said, realistically, everyone just passes things around as strings, and you're right that this is something people need to remember to do. I dunno, it's a mess. I guess we pick whichever evil folks generally think is lesser.

martinthomson commented 3 years ago

Yeah, the thing that concerns me is that one common system architecture, one that we use at least, has logic that makes three independent decisions. That is, the DNSOrIP call is made for name resolution, TLS ECH, and certificate validation. That's nasty duplication, so it's gross, but they could - at least in theory - share an implementation. As you say, if one of those is more permissible than another in the wrong direction, we get bad outcomes. To be clear, a lot else has to fail before this becomes a genuinely exploitable problem, but I don't want us to be making security holes.

I put up a patch for NSS that does this validation in case anyone wants to cheat. I probably should have been doing other work, but it's done now. The LDH validation was easier, but still non-trivial.

martinthomson commented 3 years ago

I just did some investigation and it looks like Firefox has some interesting quirks that diverge slightly from this text:

  1. Certificate validation accepts underscores ('_') in names. This includes the character at the start of the name, which is worrying enough for me to open a bug, but probably fine. The question here is whether or not we need to include underscores also.

  2. The code detects the type of reference identity in a strict order: DNS, IPv6, IPv4. That means that the DNS name recognition code is responsible for rejecting IPv4 addresses. It does that by assuming that the input has been somehow normalized and rejecting any name for which the last label is only digits. This means that it won't recognize 1.1.1.0x1 as invalid. As we have normalization at other points in the system this is not exploitable, but I will be opening a bug on that.

  3. It also rejects 1.1.1.256 as a name. As this is not a valid IP address according to the parser we are referencing, this PR would treat that as a domain name. I think that is safe in the sense that @davidben notes. If the certificate validation is more strict than the spec, the only risk is to interoperability. (And in this case, ICANN would need to open up a .256 gTLD, which is against their rules in addition to not working.)

  4. (Edited to add) It also allows a dot at the end of a name, but I think we're on firm footing here if we reject that (as the PR does).

@chris-wood convinced me that we might not need to be perfect in all places DNSOrIP is implemented, as the certificate validation is the only security-critical one. That's true when it comes to the main things we care about (server authentication). It's also true provided that the secondary security properties (whether a name is exposed via SNI) are not interlinked. I don't think we have any inter-dependency like that here. @chris-wood did manage to identify a hypothetical scenario where disagreement does lead to failures, but it's not ECH-related.

chris-wood commented 3 years ago

Given that the primary risk in disagreement between DNS or IP here seems to be interopability, and not security, perhaps we can downgrade the "MUST ignore" to a "SHOULD ignore" and clarify the risk? Failing early and consistently is certainly better, but the (possibly new) parsing requirement this imposes on the stack may not be worth the complexity. @davidben, @martinthomson: what do you think?

richsalz commented 3 years ago

I know that the parsing in OpenSSL is naive, doesn't easily handle fallbacks/alternatives. Further, it just calls the local platform code, which means there could be inconsistencies.

davidben commented 3 years ago

Re SHOULD, if we do that, I think we should have some (probably less verbose) version of https://github.com/tlswg/draft-ietf-tls-esni/pull/436#discussion_r640744959 as guidance.

Whatever we do, we should say "ECH believes this is a DNS name, so please verify accordingly" because we need to define semantics for things we expose out of TLS. The IP address thing is how to square "DNS name" with all systems built around "DNS or IP" serializations. We can either:

I'm not sure there's a way out of picking one. Anything short of a MUST that covers the "DNS or IP" syntaxes we care about puts us in the second bucket.

chris-wood commented 3 years ago

Re SHOULD, if we do that, I think we should have some (probably less verbose) version of #436 (comment) as guidance.

Indeed — this is what I mean by clarifying the risk. I’d also keep the pointer to the WHATWG parser in case someone wants to enforce that and otherwise doesn’t have a parser to rely upon.

Whatever we do, we should say "ECH believes this is a DNS name, so please verify accordingly" because we need to define semantics for things we expose out of TLS. The IP address thing is how to square "DNS name" with all systems built around "DNS or IP" serializations. We can either:

  • Say "ECH promises not to give you something that looks like an IP" and define how to do that. But "DNS or IP" systems do not use consistent or simple syntaxes, so this is tricky. (But probably doable since the WHATWG syntax is likely a superset of every other one.)
  • Or we can say "you might get something that looks like an IP, deal with it" with rough guidance on how to deal with it. But this is punting up a layer which may or may not forget and misbehave. (How much do we care?)

The sheer number of inconsistencies makes me lean towards the latter, with an example of how to deal with the inconsistency. (That is, guidance for how to deal with it in ECH, and guidance for how to deal with it outside of ECH.)

I'm not sure there's a way out of picking one. Anything short of a MUST that covers the "DNS or IP" syntaxes we care about puts us in the second bucket.

Yeah, I’m not suggesting we do nothing. I’m just proposing we relax the MUST for stacks that might otherwise fall back on a check elsewhere. This seems to balance things nicely.

davidben commented 3 years ago

Ah, sure. Works for me, though I was already okay with the Somebody Else's Problem version. :-) (Would we even need a SHOULD at that point, or should we just leave IP parsing out of ECH's hands entirely? We'd be always punting it, but consistently so.)

@martinthomson, WDYT? I wasn't sure whether to read your most recent comment as changing your mind here or still preferring ECH apply an IP parser.

chris-wood commented 3 years ago

Would we even need a SHOULD at that point, or should we just leave IP parsing out of ECH's hands entirely? We'd be always punting it, but consistently so.

I learn towards keeping it for now and removing it later if, after we all implement the thing, consider it unnecessary. There seem to be enough strange edge cases with how applications might use the DNSOrIp value in verifying client-facing servers, and removing sharp edges seems useful. That said, if all of the major stacks hate this, then certainly that says a lot too. :-)

cbartle891 commented 3 years ago

I agree that we should just change MUST to SHOULD and add some verbiage about the risks. It strikes me that we're trying to close the barn doors after the horses are out. The fact is that most systems already have some sort of DNS and IP parser, some of which are probably not consistent with each other, and to now say "you have to be consistent and here are the rules you must abide by" just isn't realistic.

We had a case recently where Safari wasn't allowing SNIs with an underscore--because we were following the spec--but when someone complained (https://twitter.com/ifette/status/1375559666340618242?s=20), we discovered that Chrome and Firefox were allowing underscores and changed our policy to reflect this, even though it went against the spec. If we can't all follow the official DNS parsing rules now, without ECH, what makes you think we would all follow the official DNS and IP parsing rules later?

If we just say "use the parsers you already have" (and point to the WHATWG parser for those who don't already have one as @chris-wood suggested), at least we'll be gaining consistency within a system.

martinthomson commented 3 years ago

Given that I've already written the code, there is a bit of sunk-cost that influences my thinking now...

That said, I am thinking that the worst that can happen here if someone gets this wrong. We get either:

For the former, if this happens, then we might not use ECH when we could. That's not great, but we might get bug reports and can fix the parser (or people can use better names).

For the latter, we will put an IP in SNI and maybe the IP will make it as far as certificate validation. If the server has a valid IP certificate for that address, then maybe it just works. Maybe the server chokes on the SNI. Maybe the client code chokes on the bizarre IP address and fails (this is what moz::pkix would do, it is picky about the format of addresses as it assumes that the caller normalizes them). If the server doesn't have a valid certificate, then the connection fails.

In this latter case, it seems like failure to validate the ECH config correctly more often leads to connection failures. That suggests that the SHOULD is worthwhile, because we don't want some clients connecting and others failing to connect.

Carrick is right to point to our collective differences, but I see that as unhealthy rather than inevitable. We should fix the underscore thing by updating specifications to match reality (Anne van Kesteren is right about this...again), but I know the IETF well enough to be wary of proposing that, even if it would be the right thing to do. I do think that the suggestion to reuse existing parsers is sensible.

So I'm down for making this change with a "SHOULD" in place of the "MUST".

chris-wood commented 3 years ago

OK, I downgraded to a SHOULD and made an attempt at clarifying risk without being overly verbose. @davidben, @martinthomson, please have a look! @sftcd, would this current framing work for your implementation? (It seems like it would, especially if the amount of parsing is to be minimized.)

richsalz commented 3 years ago

I worry we'll end up with a random mix of some TLS libraries doing a lot of work to reimplement the WHATWG algorithm (hopefully correctly!) and other libraries ignoring the text, depending on what they read between the lines of the SHOULD.

I share this concern. See my comment above about OpenSSL.

chris-wood commented 3 years ago

we'll end up with a random mix of some TLS libraries doing a lot of work to reimplement the WHATWG algorithm (hopefully correctly!) and other libraries ignoring the text, depending on what they read between the lines of the SHOULD.

@davidben, @richsalz: it's true, but this seems inevitable. (We could omit the reference entirely, and I could imagine stacks trying to implement a parser and verifier to abide by the SHOULD.) What's the downside of different libraries choosing to handle this SHOULD differently?

chris-wood commented 3 years ago

Paging @sftcd, @bemasc, and @cbartle891 for reviews!

davidben commented 3 years ago

we'll end up with a random mix of some TLS libraries doing a lot of work to reimplement the WHATWG algorithm (hopefully correctly!) and other libraries ignoring the text, depending on what they read between the lines of the SHOULD.

@davidben, @richsalz: it's true, but this seems inevitable. (We could omit the reference entirely, and I could imagine stacks trying to implement a parser and verifier to abide by the SHOULD.) What's the downside of different libraries choosing to handle this SHOULD differently?

I don't think this is inevitable. While https://github.com/tlswg/draft-ietf-tls-esni/pull/436#discussion_r640744959 (i.e. no requirement or recommendation here at all) punts the problem to the application, it punts the problem consistently. The application can then use the parser they already have. Especially since we seem to agree it won't cause security problems, it seems we should just stick with TLS not having an opinion, with non-normative text to point out this means DNS name.

The downside of libraries choosing to handle the SHOULD differently is that the ecosystem behaves differently. We'd have to consider both kinds of stacks when reasoning about the protocol. It's also less clear for any given implementor what the right behavior would be. We seem to essentially be saying "you can reimplement the WHATWG protocol, which is just code, or you do less work and skip it, so do you want more code or less code?".

(I'll go put together some text for the "consistently punt it" option, so there's something concrete here.)

chris-wood commented 3 years ago

(I'll go put together some text for the "consistently punt it" option, so there's something concrete here.)

I understand that the text can consistently punt the problem elsewhere, unlike what this PR does, but that doesn't mean implementations will consistently punt it.

In any case, if you could target your change against this PR to minimize deltas, that would be great.

richsalz commented 3 years ago

it seems we should just stick with TLS not having an opinion, with non-normative text to point out this means DNS name.

Works for me.

davidben commented 3 years ago

(I'll go put together some text for the "consistently punt it" option, so there's something concrete here.)

https://github.com/tlswg/draft-ietf-tls-esni/pull/447

In any case, if you could target your change against this PR to minimize deltas, that would be great.

Done. I've also left comments on this PR for the incidental stuff I noticed while putting that together.

sftcd commented 3 years ago

On 07/06/2021 14:49, Christopher Wood wrote:

Paging @sftcd, @bemasc, and @cbartle891 for reviews!

Assuming I figured out the actual latest text, that seems fine to me, except that DNS names can end with an explicit dot, (which is always implicitly there) as I think I pointed out before. I don't know if TLS stacks handle that well or badly but we should probably either allow the trailing dot, or, if disallowing it is really better, just say that we know we're not quite perfectly correct, in DNS terms.

Cheers, S.

chris-wood commented 3 years ago

Closing in favor of #456.