serverCertificateFingerprints security properties

letitz commented 3 years ago

As discussed offline, it would be good to address the following points in the spec:

The use of serverCertfificateFingerprints effectively downgrades the security properties of the resulting transport
- It is not as strong as regular HTTPS
- The short expiration time does mitigate the risk for certificate compromise to a certain extent
- This would likely sit well in the privacy and security considerations section
User agents should be allowed to reject pinned certificates due to ~arbitrary policy
- This would allow revoking certificates, or applying strict checks to the algorithms/key sizes of pinned certificates
- Maybe this would be best injected in the custom certificate requirements section

letitz commented 3 years ago

cc @estark37 @sleevi

martinthomson commented 3 years ago

There is definitely something worth doing here.

I'll start by saying that the claim here about security is not as simple as all that. The only thing you might reasonably say here is that the security properties you get from identifying a specific certificate are different.

In some ways, this is "stronger" than regular HTTPS (or WebPKI authentication, which is what I think you mean), due to being able to precisely identify the peer without the various confounding factors that might apply to WebPKI-based authentication (you have less exposure to CAs, for example). That is, unless you believe that there are security benefits from engaging with WebPKI. Logging in CT might be something you care about, so you might regard that as a difference in favour of WebPKI.

It might be weaker in some ways, such as not having any true revocation scheme. That particular need can be addressed more effectively by revoking the code or resources that includes the hashes. But "can" isn't will, so it's worth looking at that some more. You have to consider where the hashes come from, which likely means looking at things like HTTP caching of various resources. HTTP caching has its quirks, but generally that can be more managed on shorter timescales than something like OCSP stapling, so it could be a gain here, but at worst it could be made more sticky than OCSP might allow, leading to a real deficiency if there is a key compromise.

It's also worth looking at this from a holistic view. When a site initiates communication, they choose the entity with which they communicate. A site that wanted to replicate the fingerprints functionality could do something very similar by minting a new name, getting a certificate (or certificates) for that name, and then connecting to that name. If that name has only that set of certificates (and will only ever have that set of certificates, perhaps because the name is similarly short-lived), then identifying the peer using the name is nearly functionally equivalent to identifying the peer by the set of certificates.

The "nearly the same" conclusion includes a few things, but most of those are what motivate the inclusion of the feature. That is, it costs time and resources to create a name with valid WebPKI authentication that will work and that is operationally challenging.

Text on all that wouldn't be amiss, particularly the liveness/revocation piece.

The idea that UAs might have policies regarding rejection is something worth documenting. We might assume that the same rules regarding algorithm choice and key size apply here as they would elsewhere, but assumptions are no good: that should be written down. I might go so far as to use more modern constraints on algorithms and key sizes. We can (and should) insist on modern crypto as we don't have the same compatibility constraints as HTTP. We can still be prosaic, but we have no need to allow for old stuff (the RFC 7540 rules spring to mind as a tried and tested baseline).

sleevi commented 3 years ago

That is, unless you believe that there are security benefits from engaging with WebPKI.

The way this is worded suggests you may disagree? I think there’s quite compelling arguments regarding benefits, both those directly security related (e.g. the inability to use a weak compromised key, disclosure via CT that supports post facto detection of such keys even if they’re not initially known to be weak) and those indirectly (e.g. the efficiencies of scale for such weak key checking, the ability to ensure certificates are well-formed at issuance time rather than verification time, thus reducing parser bugs and risks, the ability to gradually evolve the crypto ecosystem with minimal disruption through the change of issuance policy).

I don’t think it’s reasonable to simply suggest that they are different, devoid of qualitative and quantitative differences in quality. There are differences in posture, as written as well as what is reasonably possible, to note that WebPKI has significant advantages, despite its warts, as it relates to aligning with the end user’s security needs and goals. I think we agree it’s weaker in some areas, but I think that the reply may be overlooking the significance, and impact, of those weaknesses, particularly when viewed holistically.

I think you’re entirely right to approach this from the lens of the server operator, and can appreciate that it benefits them to eliminate the third party (the CA) from the equation. But that seems to overlook the CA’s role in helping protect the user’s interests (as an extension of the user agent’s policies, where both work for the user’s benefit).

In that line, I don’t think this claim can be reasonably justified, except if ignoring the user’s legitimate security interests:

then identifying the peer using the name is nearly functionally equivalent to identifying the peer by the set of certificates

It certainly is possible to imagine a system in which CAs are still used, but rather than binding assertions to DNS names, the existence of such a certificate is instead a binding to a Subscriber and a compliance with such a browser policy. That is, in effect, the CA still fills those roles (of transparency, issuance-time checks, communication, evolution, etc), and only those certificates can be used. There are obviously complications with such a system, and it’s by no means easy, but the practical reality is that largely such a system isn’t being explored because WebRTC shipped similar functionality without necessarily the same attention to security implications, or admittedly the contemporaneous awareness, and so there’s a limit to the viability of “doing it in a more modern, robust way”.

I mention this because it’s clear there’s a complex set of tradeoffs. The priority of constituencies would suggest that there are a lot of ways for authors to abuse this to the detriment of the users’ needs (e.g. shipping a fixed key to every home device). WebPKI actively prevents this and has mechanisms to respond on the user’s behalf, and this doesn’t. But this also provides authors the opportunity to do the right thing, and enable new user use cases. Finding out the right line between simple solutions that still allow, and may unfortunately, facilitate abuse, and complex solutions (e.g. another type of certificate, potentially one supporting cross-signing from multiple policy issues) is tricky.

It sounds like we’re in agreement on the need for text, but it does seem to mischaracterize things to suggest there isn’t a tangible security negative difference to the user’s goals and needs, even if for the benign author, it may be functionally equivalent.

martinthomson commented 3 years ago

I didn't mean to imply "worse" or "better". I was simply suggesting that there is nuance to the question (poor choice of wording of that statement is my fault; I only meant to imply that some people will place different weighting on aspects of the question).

A really minor nit:

the ability to ensure certificates are well-formed at issuance time rather than verification time

Using a certificate hash effectively reduces the certificate to an expensive container for SPKI (it is effectively a raw public key, without the need to implement the certificate type extension in TLS). I don't think we need to worry much about this particular benefit. (I agree with the others.)

The key realization here is that WebPKI is providing some concrete benefits and some subjective ones. We should take care to ensure that as many of those benefits are retained by the alternative design as possible. (This was part of the reason why I originally wanted to defer this particular feature.)

davidben commented 3 years ago

I agree with @martinthomson here.

The Web PKI's primary function is to bind the name in the URL to a public key. No matter how trusted that is, or how good UA policies are, anything relating to the name is only as trusted as the input URL. In that regard, fingerprints are an improvement. Fundamentally, a subresource's URL comes from the document. Nothing in the Web PKI meaningfully constrains it. If the document specifies a fingerprint, we've achieved that binding directly.

The Web PKI does provide some secondary benefits along the way. It is worth identifying those and thinking about them, but I think it's fine if they take different forms with cert fingerprints.

It's true that parameters in Web PKI certificates are CA-mediated and CT-logged. This provides a different enforcement point, e.g. CAs not signing RSA-1024, and different levels of visibility. For everything else in the platform, such as TLS or JS features, sites pick parameters directly, and we measure and enforce at the client. We manage those just fine. I don't think the mediation is, in itself, important, just the enforcement. (The mediation isn't even uniformly a positive. Sites can upgrade to TLS 1.3 unilaterally, while upgrading parts of the cert requires CA coordination.)

In fact, in modern TLS modes (ECDHE ciphers and TLS 1.3), the key the Web PKI binds itself only exists to authenticate TLS parameters and key shares. If those parameters are weak, this whole thing was for naught, yet we are content with client enforcement of them.

I might go so far as to use more modern constraints on algorithms and key sizes. We can (and should) insist on modern crypto as we don't have the same compatibility constraints as HTTP.

+1. We don't need the Web PKI to enforce modern crypto, and there is no need to limit ourselves to HTTP's constraints. E.g. probably just skip RSA entirely.

Using a certificate hash effectively reduces the certificate to an expensive container for SPKI (it is effectively a raw public key, without the need to implement the certificate type extension in TLS). I don't think we need to worry much about this particular benefit.

I mostly agree with this, with a footnote that this SPKI container is so unreasonably complex, and the ecosystem such a mess, that we risk interop issues if different clients have differently lax parsers. The same interop risk exists in the Web PKI, but it's papered over by mediation. This is not so much as a reason for mediation, but a problem with X.509. I think we can solve this either by using the TLS raw public keys extension, or by making sure X.509 usage here is sufficiently strict or profiled down.

vasilvv commented 3 years ago

+1. We don't need the Web PKI to enforce modern crypto, and there is no need to limit ourselves to HTTP's constraints. E.g. probably just skip RSA entirely.

I believe the profile we currently enforce in the Chromium version of WebTransport is "RSA 2048+, P-256, P-384 or Ed25519". I've considered writing something down, but the problem with writing mandatory algorithm requirements is the risk of them becoming obsolete (see TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA being mandatory in TLS 1.0).

using the TLS raw public keys extension

As far as I'm aware, raw public keys don't have expiry, which means we can't enforce the expiry requirements we currently have (https://w3c.github.io/webtransport/#custom-certificate-requirements)

estark37 commented 3 years ago

Do we need to support RSA certs?

davidben commented 3 years ago

+1. Dropping RSA seems a nice "please generate new keys here" nudge. Also RSA keygen is terrible so, in so far as we want to partially replicate CAs blocking weak keys (Ryan clarified that "weak keys" meant issues like ROCA or Debian's PRNG failure), ECDSA keygen is a bit more uniform so failures tend to be more generic PRNG failures than weird keygen algorithm problems.

As far as I'm aware, raw public keys don't have expiry, which means we can't enforce the expiry requirements we currently have

Ah, good point. And while not strictly necessary from a web security model perspective, it is a valuable nudge in the right direction. By that I mean...

I also realized after some discussion that my "primary" vs "secondary" distinction was probably more confusing and ill-defined than was helpful. I think what I actually meant to capture is precise security models vs. making sure it is possible and easy to do the right thing / difficult to do the hard thing. By this I mean:

In the browser, every hard guarantee we can possibly get out of HTTPS (TLS, Web PKI, etc.) and promise to the user is relative to the URLs we're using. We can tell the user this page was served from https://example.com, but https://example.com gets to decide what that means. If https://example.com wants to post all TLS traffic secrets online, we can't do much about that. If https://example.com sources script from (and thus delegates all script authority to) https://popular-script-cdn.example, or proxies content, that's part of https://example.com documents. So, in that vein, if https://example.com refers to "public key abcdef1234", that's within the web's security model. (And, in some ways, stronger than delegating to DNS name.)

But that shouldn't be the end of the story. Mixed content is bad, but is perfectly within this security model. Delegating scripting authority to http://insecure.example is a terrible idea, but so is delegating to https://publishes-tls-traffic-secrets.example or https://proxies-cleartext-http.example. We block http://insecure.example embeds because it is an easy mistake for accidentally make, and pretty much always wrong. We want to make the right setup possible and easy, and the wrong setup difficult to do on accident. I'd argue some of the Web PKI and CT checks around weak keys, etc., fall in this category.

So, in that vein, expiry is about encouraging regular key rotation and revocation. I guess, in the current formulation, the aim is sites will automatically:

Generate a new cert (with a new key...)
Have JS calls include the new fingerprint.
Wait for that the settle.
Switch the server to the new cert.
Remove the old fingerprint.

(Or do something ephemeral such that this doesn't matter.)

martinthomson commented 3 years ago

My understanding is that fingerprints are mostly useful in ephemeral cases. A site might spin up a temporary VM to handle a session or a small number of session. That VM generates a new key and fingerprints for that are shared with users assigned to it. Once the call/gaming session/whatever is over, the VM is reclaimed and the world moves on.

Stable services could use fingerprints in exactly the way you describe, but it might be easier for them to just get a name and run certbot.

davidben commented 3 years ago

Ah yeah, good point. Not as familiar with all the use cases, but I'd buy that non-ephemeral services probably want names and thus we should focus primarily on ephemeral ones.

jan-ivar commented 3 years ago

Meeting:

Victor signed up to write. Martin to review

vasilvv commented 3 years ago

More meeting notes:

We need to define some minimal MTI profile, most likely P-256.
Security considerations should detail the reasoning behind the two-week expiry limit.

letitz commented 2 years ago

An additional point here came up during the Blink review process: we should make sure that WebTransport connections to origin O that use serverCertificateHashes do not share state with origin O. Indeed, there is no guarantee that the target of the such connections are really O.

Say I write the following:

const transport = new WebTransport("https://foo.example", {
  serverCertificateHashes: [...],
});

This looks like the connection could share state with https://foo.example at first glance. However, the origin here is only used to resolve the target endpoint, it is not authenticated with the Web PKI. If an attacker controls the network and runs this code, it could connect to its own server and attempt to read or write state belonging to https://foo.example.

While there does not seem to currently be much of a surface for such problems to arise, that may change as things evolve. The specification should be careful to note this pitfall.

Note that things used to be a bit clearer when non-https scheme were used (IIRC, quic-transport: was used for a while?). In that case, there was less opportunity for confusion with non-WebTransport-related origin-scoped state. That said, there still was the problem of pooling connections between different values of serverCertificateHashes - hence I imagine the current ban on using allowPooling with serverCertificateHashes.

vasilvv commented 2 years ago

I think we in general avoid this by the fact that we in general don't tie any state to WebTransport connections. The only two notable exceptions are connection pools and TLS tickets. Connection pools are explicitly disallowed by the spec, and we probably should ban session tickets too.

letitz commented 2 years ago

Ah, thanks for pointing that section out, it's exactly what I was looking for.

As for TLS tickets: I am not familiar with them off the top of my head, but is there any risk if they are "stolen" by an attacker server? In my example above, an attacker could steal the TLS ticket for foo.example. Could they use it in nefarious ways?

Conversely, an attacker could set the TLS ticket for foo.example then let the user connect to the real foo.example. Would there be any risk in that?

Overall, I tend to agree that the safest would be to specify and implement that WebTransport connections, if created with serverCertificateHashes, do not re-use TLS tickets.

martinthomson commented 2 years ago

Do not attempt to specify until you understand.

Session tickets are generally protected with a key that is bound to a server instance, or a cluster. In terms of capabilities, they are nearly as dangerous as the private key associated with the certificate. And they are treated as such. Usually it is easier to manage a session ticket encryption key because they have a narrower scope.

That said, prohibitions on resumption don't really get you any real security. Just extra awkward test cases and more work. My experience with server deployments is that resumption is often enabled, as it has direct advantages for performance. It might be that it doesn't work or has to be disabled for other reasons. So maybe it won't get used that much. but we shouldn't force it.

davidben commented 2 years ago

There are two directions to think about sessions. On the server side, you have to worry about the security requirements on your ticket key material. And, yes, those keys should be treated as comparable the long-term private key.

On the client, you need to worry about which connections/requests observe or write to a session cache entry. This has security implications (connections with different authentication expectations should not resume across each other), and privacy implications (resumption allows connections to be correlated).

For the second, similar to the pooling question, this feature does need to discuss when two connections can share such state. I'd probably suggest it use the same levers as pooling itself, as it's a very similar set of considerations.

letitz commented 2 years ago

Thanks, that makes sense. I was thinking about it from the client side indeed, with the objective of disallowing pooling.

w3c / webtransport

serverCertificateFingerprints security properties #349