hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
31.12k stars 4.21k forks source link

New PKI multi-issuer support seems to have made wrong choice about CRL and AIA URLs #16060

Closed maxb closed 1 year ago

maxb commented 2 years ago

I've just been reading https://github.com/hashicorp/vault/blob/main/website/content/api-docs/secret/pki.mdx#notice-about-new-multi-issuer-functionality and have come across the following:

Certificate Revocation List (CRL) configuration is common to all issuers, All authority access URLs are common to all issuers,

I am concerned that this is a design error, as a successful forward-rolling CA deployment needs CRL and AIA URLs to be unique for each issuer.

maxb commented 2 years ago

Now confirmed by reading the code ... these settings need to be per-issuer to properly enable the forward-rolling workflow this was supposed to enable.

cipherboy commented 2 years ago

Hi @maxb, I think our intent here was to have the default issuer's paths (the existing paths at /ca, /crl &c) be used for AIA; when the CA rotation is ready, we'd switch the new CA to become default, taking over those paths. These wouldn't be canonical (well, unique) for each issuer (like say, /issuer/:issuer_uuid/der would be), but instead be common to the mount. We still encourage as much as possible to have separate, unrelated CAs be separate mount points (and that all issuers in a single mount point really function as a single Authority).

I definitely agree generally that per-Issuer AIA selection is necessary though for true multi-issuer support though. We presently consider one mount to be one Authority (common list of serials issued, policies, &c) which hinders this slightly in other regards.

I'll bring it up with the team and see what they say.

maxb commented 2 years ago

Let me type up a more detailed use case to illustrate why the CRL and AIA should never be set to the default/implicit issuer paths, and should always reference a specific issuer:

Suppose we are in a renewal situation, transitioning from "ExampleCorp Intermediate CA G1" to "ExampleCorp Intermediate CA G2" (intermediate CAs with differing Common Name and different key material).

There is an application with an end-entity certificate issued from G1 deployed. At startup it uses the AIA URL in its own certificate to fetch the intermediate CA that issued its certificate, and serves it to clients in the TLS handshake. (Now, you might say this is a mad design, and just configure the proper chain statically. I would agree. But the mad design of the .NET certificate APIs and .NET Linux runtime steers people to this approach, so I know there are real applications doing this.) For this to work, the AIA URL in the certificate must point stably to the specific generation of intermediate used to issue the particular certificate. If the AIA URL points to one which will be dynamically changed to point to a different CA, it will risk causing production breakage of applications relying on it.

Now let's consider clients of the above application that are doing CRL checking. They will access the CRL URL embedded in the certificate. It is critical that the CRL URL point to the CRL for the correct CA. If it later dynamically changes to point to a CRL for a different CA, again, it risks breaking production applications which will then fail their revocation checking process.

OK then, you might say, this is suboptimal, but you could manage this by doing a coordinated update of the PKI secrets engine configuration, from issuer-specific URLs to G1 endpoints, to issuer-specific URLs to G2 endpoints, at the same time as you change the default issuer.

Well, kind of, but in this case you remove the ability to do realistic pre-production testing of the G2 setup.

Additionally, not all cases for multiple issuers are a simple cutover from old to new. The PKI secret engine documentation itself proposes a use-case for maintaining a lower-security Vault-backed intermediate and a higher-security HSM-backed intermediate that are concurrently used for issuance, with the choice selected by PKI secret engine role configuration. This case will not be able to function correctly (unless you can do without CRL and AIA entirely) unless the URLs can be configured per issuer.

cipherboy commented 2 years ago

OK then, you might say, this is suboptimal, but you could manage this by doing a coordinated update of the PKI secrets engine configuration, from issuer-specific URLs to G1 endpoints, to issuer-specific URLs to G2 endpoints, at the same time as you change the default issuer.

Yeah, I mean, except for root->intermediate chaining which could be longer (see the other thread), the intermediate->leaf chain requires that exp(leaf) <= exp(intermediate), so realistically you'd have to rotate your leaf along with your intermediate at a similar window. (Other than testing, as you pointed out).

Additionally, not all cases for multiple issuers are a simple cutover from old to new. The PKI secret engine documentation itself proposes a use-case for maintaining a lower-security Vault-backed intermediate and a higher-security HSM-backed intermediate that are concurrently used for issuance, with the choice selected by PKI secret engine role configuration. This case will not be able to function correctly (unless you can do without CRL and AIA entirely) unless the URLs can be configured per issuer.

But yeah, this one definitely warrants adding this functionality.


One side note:

Now let's consider clients of the above application that are doing CRL checking. They will access the CRL URL embedded in the certificate. It is critical that the CRL URL point to the CRL for the correct CA. If it later dynamically changes to point to a CRL for a different CA, again, it risks breaking production applications which will then fail their revocation checking process.

Does .NET actually validate that the CRL issuer belongs to the CA? Also, how does it handle multiple different CRLs with multiple URIs? (Each cluster has its own CRL).

I bring this up because in earlier Vaults (<= 1.10.z), you could do something like:

vault write pki/root/generate/internal ...
vault write pki/issue/testing ...
vault write pki/revoke serial_number=$LAST_SERIAL_NUMBER
vault delete pki/root
vault write pki/root/generate/internal ...
vault read pki/crl/rotate # Trigger rebuild of the CRL

And then fetch the CRL. You'd see all the old revocations from the past CA on the present CA's CRL, until leaf expiry.

In 1.11, we preserved this behavior, where if you lack a matching issuer, they'll appear on the default issuer's CRL. So if you were to rotate issuers and remove the old one from CRL building / from the mount point completely, you'd still get revocation info for that particular issuer.

maxb commented 2 years ago

Does .NET actually validate that the CRL issuer belongs to the CA?

I have not explicitly tested this, but I should really hope so, as it's a core security property of how CRLs are supposed to work.

Also, how does it handle multiple different CRLs with multiple URIs? (Each cluster has its own CRL).

That's actually another area which needs fixing - Vault needs to change to serve a single CRL common to all clusters, to satisfy the CRL checking protocol defined by RFC 5280. I have actually raised this via email to HashiCorp's Shaun Edwards and Ricardo Oliveira, back in April, but looking back at the conversation, it looks like I missed a key reply whilst on holiday, and let the conversation drop. I'll try to get around to reviving that at some point, though work is very busy right now.

I recently noticed this sentence in the 1.11 docs:

These separate CRLs should either be aggregated into a single CRL (externally; as Vault does not support this functionality) or multiple crl_distribution_points should be specified here, pointing to each cluster and issuer.

The second part "or multiple crl_distribution_points should be specified here, pointing to each cluster and issuer" needs to be removed, as it won't work - clients only keep trying multiple URLs until ONE of them succeeds.

You'd see all the old revocations from the past CA on the present CA's CRL, until leaf expiry.

This behaviour is technically incorrect, though fairly harmless provided serial numbers don't collide.

cipherboy commented 2 years ago

Combining two threads:

Does .NET actually validate that the CRL issuer belongs to the CA?

I have not explicitly tested this, but I should really hope so, as it's a core security property of how CRLs are supposed to work.

:D

You'd see all the old revocations from the past CA on the present CA's CRL, until leaf expiry.

This behaviour is technically incorrect, though fairly harmless provided serial numbers don't collide.

As I understand it, there's two ways to do CRL signing: signing with the main CA cert and with a delegated CRL signing cert. IIUC, the other cert could be completely independent from the main CA, but needs to be referenced from the CRL access field. When revoking from multiple parent CAs, the Certificate Issuer field can then be used to identify which issuer a specific serial number corresponds to (which would also need to be present when using a delegated cert for CRL verification).

Vault presently doesn't support either, so I would agree that its generally a little non-conformant, but I'd be surprised if .NET implemented most of this behavior; my understanding is NSS and OpenSSL generally treat the CRL as a trusted entity (leaving verification of CRL signature up to the user) and then as a flat list of serial numbers regardless of origin. I could be wrong, its been a bit since I've looked at NSS's libpkix though; the main validation code doesn't have as advanced of CRL verification as libpkix does. This is e.g., the behavior of using a CRL with nginx/Apache -- it is an unchecked, trusted list and is strictly used as a list of revoked serials regardless of origin.


Back to the other thread:

That's actually another area which needs fixing - Vault needs to change to serve a single CRL common to all clusters.

Yeah, we're looking to add support for that in the near future. :-) But that does involve several other considerations (cross-cluster traffic increases, delegated CRL signer, &c). Needless to say, I believe its slightly non-trivial and might come with other limitations. OCSP would be more ideal, but again, need to solve the cross-cluster traffic question.


Lastly:

I recently noticed this sentence in the 1.11 docs:

These separate CRLs should either be aggregated into a single CRL (externally; as Vault does not support this functionality) or multiple crl_distribution_points should be specified here, pointing to each cluster and issuer.

The second part "or multiple crl_distribution_points should be specified here, pointing to each cluster and issuer" needs to be removed, as it won't work - clients only keep trying multiple URLs until ONE of them succeeds.

My read of the relevant RFC section doesn't seem to say that all CRL distribution points must refer to the same CRL and indeed allows multiple disparate distribution points. The broader section on CRLs does also seem to agree that there could be multiple valid CRLs at one point in time.

So I think this behavior would be non-conformant but somewhat expected. This then says, to me, the best approach would be the former (manual aggregation of multiple CRLs into a single entity and distributed accordingly).

maxb commented 2 years ago

My read of the relevant RFC section doesn't seem to say that all CRL distribution points must refer to the same CRL and indeed allows multiple disparate distribution points. The broader section on CRLs does also seem to agree that there could be multiple valid CRLs at one point in time.

So I think this behavior would be non-conformant but somewhat expected. This then says, to me, the best approach would be the former (manual aggregation of multiple CRLs into a single entity and distributed accordingly).

Section 6.3 makes it clear - through prescriptive documentation of how CRLs are to be processed - that multiple distribution points are multiple different ways to access the same data, unless the CRLs are sharded by revocation reason (which is discouraged, and Vault doesn't do anyway).

maxb commented 2 years ago

I wonder, should I close this issue and open two new ones with relevant summaries? We now have two non-trivial matters in intertwined discussion:

cipherboy commented 2 years ago

@maxb Up to you; both issues make sense and the CRL issue was already on our radar to start addressing in 1.12. AIA URLs is thus the only new one, so I'm fine leaving this one open as a tracker for AIAs.

cipherboy commented 2 years ago

@maxb There's a PR here that you could take a look at for the per-issuer AIA stuff: https://github.com/hashicorp/vault/pull/16563

Let me know if you find anything in it objectionable ;-)

The CRL handling stuff is still WIP and pending some more internal discussions.

cipherboy commented 1 year ago

@maxb One part of AIA handling has been done for 1.13 in #18199 -- let me know what you think. This should allow mirroring the behavior of some public CAs, where the CRL is sharded (in our case, by PR cluster), but that AIA on the cert points to the correct CRL distribution point.

We're still working on unified revocation + CRLs, hope to have an update for you on that one soon.

maxb commented 1 year ago

Hi @cipherboy,

I have just had a look through the change in #18199.

Although it definitely makes things incrementally better than before, I still foresee it causing problems for users. I would like to ask why the choice was made to restrict the user to substituting the issuer_id into a fixed template global to the entire secret engine?

Why not provide the more flexible and arguably simpler option, of allowing the user to specify the full URL for each issuer?

That way, if they choose to make them all the same other than the issuer_id, they can, but they have flexibility to point them at other endpoints if needed.

Why might this be desired? Well, I think most serious Vault deployments will use HTTPS, as the Vault tokens ought to be encrypted on the network between server and client. But, CRL and AIA URLs need to not be HTTPS, because the RFCs say so and client software enforces that - it's part of avoiding a chicken-and-egg scenario, of not being able to validate a certificate because you fail to access CRL/AIA URLs because you can't validate the certificate they're using.

Because of this, it will be very common for users to either copy their CRL and AIA information to a separate webserver, or to place a caching proxy in front of the Vault API.

In my organization's deployment we do this. And we don't have the issuer_id in our URLs, because issuers weren't a thing when we set this up. Therefore we can't gracefully migrate to this new configuration because we need full freedom to define a completely different URL between the old and new issuer, in order to be able to cleanly roll forward.

maxb commented 1 year ago

Ah, actually, I wrote the above assuming #18199 was the whole of the solution. I think I overlooked new options that were added in the final form of #16563, which may address my use-case.

cipherboy commented 1 year ago

Yeah, both per-issuer AIA URLs (in #16563 and 1.12 -- /issuer/:issuer_ref) and global AIA URLs (/config/urls) can be templated under #18199 (with the new /config/cluster API) -- both require the path to the cluster (presumably to Vault... see below), but in the former, you could probably avoid substituting in the issuer ID if you want and use say, issuer name or some other nicer identifier.

The latter allows a very nice global URL config of:

$ vault write pki/config/urls  enable_templating=true
    crl_distribution_points={{cluster_path}}/issuer/{{issuer_id}}/crl/der
    issuing_certificates={{cluster_path}}/issuer/{{issuer_id}}/der
    ocsp_servers={{cluster_path}}/ocsp

and so the stable issuer reference is inserted for you. Obviously if you're socializing CRLs to a server outside of Vault for internal distribution, you probably don't want to use issuer_id and prefer some other name, so per-issuer AIA URLs are probably preferable. But you could perhaps replace the path member of /config/cluster to your internal CDN's address. The caveat being, we intend to reuse this for an upcoming feature too, which'd break if it isn't directly pointed at Vault, and you couldn't say, have OCSP use one set of per-cluster addresses and CRLs another (there's only one path variable).

Do note though that both /issuer/:issuer_ref and /config/urls are still global, cross-cluster endpoints -- its just with templating with the value from /config/cluster that you can get cluster-local values.


I do see your point about protocol though. You could definitely put a LB in front of Vault and limit HTTP traffic to certain endpoints, but I think that'd be a bit of work to manage config-wise. Easier with TF though... Maybe its better to have two paths -- a cluster_path and a cluster_aia_path -- the latter allows for identifying an off-Vault address?

While arguing RFCs, though, the quote is:

   CAs SHOULD NOT include URIs that specify https, ldaps, or similar
   schemes in extensions.  CAs that include an https URI in one of these
   extensions MUST ensure that the server's certificate can be validated
   without using the information that is pointed to by the URI.  Relying
   parties that choose to validate the server's certificate when
   obtaining information pointed to by an https URI in the
   cRLDistributionPoints, authorityInfoAccess, or subjectInfoAccess
   extensions MUST be prepared for the possibility that this will result
   in unbounded recursion.

This is a "SHOULD NOT", not a "MUST", so there is some leeway there. The second statement with the MUST is more interesting though: if you self-host Vault on a Vault-backed CA and wish to use AIA for its certs, this would violate both conditions, so it definitely is worth thinking about before 1.13 releases...

I believe that any client software which enforces the HTTP-ness would be int he wrong based on the above. Do you have examples?

maxb commented 1 year ago

For myself, I'm satisfied that the main problem I raised this issue about, the mount-global configuration of AIA/CRL URLs, has now been fixed, with the creation of the ability to set AIA/CRL URLs on the pki/issuer/:issuer_ref APIs, so I'll go ahead and close this issue now.

To respond to your final question - I do not have a concrete example, but I was told that it was an issue for some software in our company. Also, given the RFCs say that CAs SHOULD NOT use https URLs, it seems more than likely some client implementations have decided not to consume what should not be produced :-)