[BUG] Exception when hot-reloading Let's Encrypt certs issued by alternate intermediate CAs

perios101 commented 5 months ago

What is the bug? As part of the process using the REST API to hot reload transport and http certificates, we receive a 500 error.

The old certificate (and all previous ones to date) has been issued by Let's Encrypt's R3 CA (which seems to have been alive and well as DR CA) and the new one by R11 CA. Same scenario is happening on an environment where intermediate CA changed from R11 to R10. Both are covered by parent ISRG Root X1 CA (along with the old R3).

Restarted individual pods automatically pick up new certificate as valid without issue however hot-reloading existing instances throws the same error over and over despite both CAs are Let's Encrypt issuers covered by root CA.

How can one reproduce the bug? Steps to reproduce the behavior:

Configure and use OpenSearch using Let's Encrypt certificate
Renew certificate (using different intermediate CA to the original one
Hot-reload certificates using REST API
See error

What is the expected behavior? Certificate validation should succeed as both certificates were issued via valid CA - pinning to same CA as the old one is not considered good practice, especially in DR scenarios (yes, Issuers have those too).

What is your host/environment?

OS: Amazon Linux (containerised for Kubernetes)
Version: OpenSearch 2.14.1
Plugins:

Do you have any screenshots? Exception: java.io.IOException: OpenSearchSecurityException[Error while initializing transport SSL layer from PEM: java.lang.Exception: New Certs do not have valid Issuer DN, Subject DN or SAN.]; nested: Exception[New Certs do not have valid Issuer DN, Subject DN or SAN.];

Do you have any additional context?

From digging into the Security plugin code, it appears logic HERE is the culprit.
We have the same issue across three different environments (affects also OpenSearch Dashboards - using same security plugin).
Workaround (restarting pods) exists but it's not scalable / fit as a long-term solution.

shikharj05 commented 5 months ago

Thanks for filing the issue, I see a previous issue https://github.com/opensearch-project/security/issues/2360 where this was reported and being worked on. @shubhmkr-amazon are you still working on this one?

shubhmkr-amazon commented 5 months ago

Hi @shikharj05, Yes, I am still working on this.

stephen-crawford commented 4 months ago

[Triage] HI @perios101 thank you for filing this issue. Looks like a similar bug is still being worked on. I am going to assign this to @willyborankin since he has expressed interest in taking a look at this in the time being.

tan3-netapp commented 3 months ago

We encountered the same issue after Let's Encrypt's replaced its intermediate certs to R10/ R11. We still have the same workaround method: restart all nodes to pick up the new cert, which seems unscalable for a large number of nodes and could impact clusters without proper replication configuration.

tan3-netapp commented 3 months ago

Hi @shubhmkr-amazon and @willyborankin, Could I get to know your progress on this issue? I could give you a hand if necessary.

shubhmkr-amazon commented 3 months ago

Hi @tan3-netapp, Earlier, I made the changes but got stuck in the unit tests due to self-signed certificates used in testing. I started working on this PR again and will try to raise it by August 29th. In case @willyborankin has made some progress, he can take the lead on this PR.

Thanks

willyborankin commented 3 months ago

Hi @tan3-netapp, the issue has been resolved in this PR: https://github.com/opensearch-project/security/pull/4624. It has been merged into both the main and 2.x branches and will be included in the 2.17 release.

shubhmkr-amazon commented 3 months ago

Hi @willyborankin, it looks like PR: https://github.com/opensearch-project/security/pull/4624 is solving a different problem. In the mentioned issue, the error is coming because of a change in the intermediate CA. Here, during the hot-reload, the issuer DN of the in-memory certificate will fail to match the new certificate in case the ICA has changed.

tan3-netapp commented 3 months ago

Hi @shubhmkr-amazon, I would appreciate it if I could have a look at your working PR, just out of curiosity.

parislarkins commented 2 months ago

Hi @shubhmkr-amazon / @willyborankin , are either of you still working on a potential fix for this issue?

It is still causing us a fair bit of pain, so I was thinking of adding two new bool configs that could be set to false to skip the DN verification (https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/ssl/DefaultSecurityKeyStore.java#L656), for example:

plugins.security.ssl.transport.enforce_cert_reload_dn_verification
plugins.security.ssl.http.enforce_cert_reload_dn_verification

How does that sound as a potential approach?

shubhmkr-amazon commented 2 months ago

Hi @parislarkins, the approach you suggested of bypassing the issuerDN check is not ideal. In this case, a self-signed certificate can also be renewed, which is not trusted. I am working on a fix that will perform issuer DN validation by building a chain of trust with an in-memory trust store(However, there will be one limitation with this approach: what if the root certificate itself gets rotated?).

parislarkins commented 2 months ago

Hi @shubhmkr-amazon, thanks for getting back to me!

While the point you make about self-signed certificates is true, I'm unsure why this is a particular concern given that there is no such validation when OpenSearch is restarted.

My understanding is that rotating certs (either by a full restart or hot-reload) requires updating the certs on disk and then being a super admin or having the ability to restart OpenSearch. If the concern is that a malicious actor could trigger a hot-reload to change to a self-signed cert, then I think it likely that they would have the ability to restart OpenSearch as well, given they must have access to the OpenSearch server in order to update the cert on disk.

Would you be able to clarify what the concern is with being able to hot-reload to a potentially self-signed cert (on an opt-in basis), given this is allowed by restarting OpenSearch?

Thanks!

parislarkins commented 2 months ago

I've raised https://github.com/opensearch-project/security/pull/4752 as a potential fix for this issue, to allow disabling the Distinguished Names validation when performing a hot-reload. In our case we do not think the validation is providing us additional security value, so are happy to disable it. There may be other use cases where this is not appropriate though, so the fix suggested by @shubhmkr-amazon may be the best option there, even if it doesn't work for root CA rotation.

perios101 commented 1 month ago

@cwperks this bug needs reopening since the PR above was reverted in https://github.com/opensearch-project/security/pull/4840

parislarkins commented 1 month ago

@perios101 it has now been re-merged by #4841

opensearch-project / security

[BUG] Exception when hot-reloading Let's Encrypt certs issued by alternate intermediate CAs #4509