httpwg / http-extensions

HTTP Extensions in progress
https://httpwg.org/http-extensions/
436 stars 145 forks source link

TLS [Untrusted, Expired] Peer Certificate #1227

Closed mnot-fastly closed 4 years ago

mnot-fastly commented 4 years ago

Looking at what to map x509 errors (as returned by openssl into, the most obvious one is tls_untrusted_peer_certificate.

However, there's also tls_expired_peer_certificate, which is one of these errors. It seems odd to have these two, but not any of the many other verification errors.

At first glance, I suspect it may be better to have a single tls_peer_certificate_verification error, and let the detail convey the reason - whether it's expiration or something else.

PiotrSikora commented 4 years ago

Ultimately, there is a fine balance between providing a generic "catch-all" error, and enumerating all possible errors.

The linked OpenSSL man page lists 69 possible reasons for success or failure, and including all of them clearly doesn't make sense, especially when only a few of them will happen in practice.

However, having only a generic "catch-all" error with the real reason "hidden" in a free-form text in details, that's going to vary from implementation to implementation, is going to make this less useful for any kind of automated processing and/or analysis.

Ideally, I think, we should have specific enums that cover 95%+ of reasons happening in production, and only use the generic "catch-all" error for things not covered by them... So, playing the devil's advocate, perhaps we should add a few more?

(Note that there is already tls_unexpected_peer_certificate raised on mismatched identity, SHA256, SPKI or other missed expectations.)

Also, if you can get your hands on errors happening in production, that could help us make more informed decision here.

mnot-fastly commented 4 years ago

I agree we shouldn't do a 1:1 mapping (especially with a library), but the hierarchy needs to be clear, and mapped to terminology used in the standards. I.e., there should be a generic catch-all 'cert error' code, and then more specific ones for the major classes of problems.

davidben commented 4 years ago

TLS itself also contains a pile of certificate-verification-related alerts. Although I think the set is mostly just carried over from SSL. https://tools.ietf.org/html/rfc8446#section-6.2

   bad_certificate:  A certificate was corrupt, contained signatures
      that did not verify correctly, etc.

   certificate_revoked:  A certificate was revoked by its signer.

   certificate_expired:  A certificate has expired or is not currently
      valid.

   certificate_unknown:  Some other (unspecified) issue arose in
      processing the certificate, rendering it unacceptable.

   unknown_ca:  A valid certificate chain or partial chain was received,
      but the certificate was not accepted because the CA certificate
      could not be located or could not be matched with a known trust
      anchor.

   access_denied:  A valid certificate or PSK was received, but when
      access control was applied, the sender decided not to proceed with
      negotiation.

   certificate_required:  Sent by servers when a client certificate is
      desired but none was provided by the client.

@sleevi may also have some thoughts on certificate verification error-reporting.

sleevi commented 4 years ago

Despite what libraries may provide, for their developers, RFC 5280 is fairly clear: the only defined result is a boolean value

I think coupling any protocol-level expectation about diagnostics for errors is fundamentally misaligned. It should not matter, for example, whether a particular library checks the validity period of the certificate before the signature. We know, empirically, that different libraries do very different things there: macOS vs Mozilla NSS vs Microsoft CryptoAPI each verify something like validity at different points within their verification algorithms.

We know it's deeply flawed to view there being "the" certificate chain for a server, because by definition, the chain is defined by the client's policy and set of trust anchors. As a consequence, we know that clients SHOULD implement certificate path building, to explore possible certificate chains, and we know that TLS 1.3 was explicitly updated to reflect those decades of running code and the emergent rough consensus that the Certificate message, beyond the first Certificate, is an attempt to aid/influence that.

RFC 4158 is a huge collection of different ways verifiers can process and examine errors, during forward path building and reverse verification, and notes that the trade-offs are heavily dependent upon client and use cases.

While I can understand the appeal for providing 'informational' statuses, any suggestion beyond a level of MAY is probably too normative, and unsupported by the underlying specifications and best practices.

mnot commented 4 years ago

@sleevi the context here is a CDN or other reverse proxy reporting back errors in HTTP response headers, for debugging. There are no requirements around sending them (and senders are cautioned against leaking information that could aid an attacker).

So, at one end a Boolean doesn't really help, and at the other end, the full list of errors by any one library (or all of them) is impractical.

The list of alerts from @davidben seems promising, and I note that we already have tls_error, whose value can be any TLS Alert string. However, That's intended to convey alerts received (and perhaps we should communicate that more clearly).

The desired semantics here are that the problem indicated by the alert is generated in the client itself, not received by it.

That leads me to believe that there are two viable paths forward:

  1. Rearrange the tls_* errors so that they mirror the alerts above (leaving tls_error's semantic as stated above); e.g., we'd have tls_bad_certificate, tls_certificate_revoked, tls_certificate_unknown and so forth

  2. Reduce everything into two errors, tls_error_received and tls_error_generated or similar. Both would carry the TLS Alert value as an additional payload item.

I think the difference is mostly stylistic, so @PiotrSikora and I can figure that out. The bigger question is whether this is the right set of errors to focus upon.

sleevi commented 4 years ago

So, at one end a Boolean doesn't really help, and at the other end, the full list of errors by any one library (or all of them) is impractical.

But I think this is missing an important detail: from the perspective of a client, the only stable output is the boolean. For example, OpenSSL's reporting of errors is symptomatic of deeper flaws within its certificate verification; fixing those errors necessarily means some existing errors can't effectively be reported. This is covered at some length in RFC 4158, Section 3.2.

Ultimately, the set of errors that are relevant are, to an extent, subject to local policy: it's rare for certificate validators to think of errors in the ontology laid out by the TLS layer. Typically, a client just does a "best guess" (or just maps straight to bad_certificate) when trying to map to the TLS layer, in part so it doesn't disclose extra information. This is certainly true when you think about local policy, such as limits on validity or the requirement of associated metadata, potentially delivered over TLS, such as Certificate Transparency or OCSP Must-Staple.

I think the core question is whether you expect the client and the proxy to be in the same management domain. If they aren't, and the ED seems to be oriented around this scenario (where server and proxy are likely same administrative domain, but proxy and client aren't), then it seems better served as an opaque diagnostic string. As RFC 4158 calls out, beyond the 'boolean' is really just an administrative/diagnostic capability, not something meant to be semantically rich, meaningful, or stable, and certainly not to allow next-hop policy decision making.

mnot commented 4 years ago

Just to be clear - the user agent isn't involved in this at all, beyond receiving the header so as to make it available to someone diagnosing a problem (Likely someone from the origin, from the CDN/reverse proxy operator, and/or a help desk talking to an end user).

What I think I hear you saying is that we should define a generic error and allow implementations to convey textual detail in it. Does that make sense?

I also still see some value in distinguishing a TLS alert received from the origin by the proxy (in its role as TLS client) from this case (a problem it encounters as a TLS client).

sleevi commented 4 years ago

What I think I hear you saying is that we should define a generic error and allow implementations to convey textual detail in it. Does that make sense?

Yes. This matches with Criterion 2 of https://tools.ietf.org/html/rfc4158#section-2.2 and the dual mode trade-off in https://tools.ietf.org/html/rfc4158#section-3.2 , which perhaps more clearly highlights the local policy aspect ("Ultimately, the developer determines how to handle the trade-off between efficiency and provision of information").

I also still see some value in distinguishing a TLS alert received from the origin by the proxy (in its role as TLS client) from this case (a problem it encounters as a TLS client).

100% agreed here. I was only commenting to the extent at which the proxy validates the peer certificate independent of the TLS layer framing. Perhaps poorly stated, but my comment in https://github.com/httpwg/http-extensions/issues/1227#issuecomment-664108708 was trying to capture that Option 2 from https://github.com/httpwg/http-extensions/issues/1227#issuecomment-664105951 (tls_error_generated) would likely just reduce to a boolean anyways, in a well-behaved TLS client.