Closed perios101 closed 1 month ago
Thanks for filing the issue, I see a previous issue https://github.com/opensearch-project/security/issues/2360 where this was reported and being worked on. @shubhmkr-amazon are you still working on this one?
Hi @shikharj05, Yes, I am still working on this.
[Triage] HI @perios101 thank you for filing this issue. Looks like a similar bug is still being worked on. I am going to assign this to @willyborankin since he has expressed interest in taking a look at this in the time being.
We encountered the same issue after Let's Encrypt's replaced its intermediate certs to R10/ R11. We still have the same workaround method: restart all nodes to pick up the new cert, which seems unscalable for a large number of nodes and could impact clusters without proper replication configuration.
Hi @shubhmkr-amazon and @willyborankin, Could I get to know your progress on this issue? I could give you a hand if necessary.
Hi @tan3-netapp, Earlier, I made the changes but got stuck in the unit tests due to self-signed certificates used in testing. I started working on this PR again and will try to raise it by August 29th. In case @willyborankin has made some progress, he can take the lead on this PR.
Thanks
Hi @tan3-netapp, the issue has been resolved in this PR: https://github.com/opensearch-project/security/pull/4624. It has been merged into both the main and 2.x branches and will be included in the 2.17 release.
Hi @willyborankin, it looks like PR: https://github.com/opensearch-project/security/pull/4624 is solving a different problem. In the mentioned issue, the error is coming because of a change in the intermediate CA. Here, during the hot-reload, the issuer DN of the in-memory certificate will fail to match the new certificate in case the ICA has changed.
Hi @shubhmkr-amazon, I would appreciate it if I could have a look at your working PR, just out of curiosity.
Hi @shubhmkr-amazon / @willyborankin , are either of you still working on a potential fix for this issue?
It is still causing us a fair bit of pain, so I was thinking of adding two new bool configs that could be set to false
to skip the DN verification (https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/ssl/DefaultSecurityKeyStore.java#L656), for example:
plugins.security.ssl.transport.enforce_cert_reload_dn_verification
plugins.security.ssl.http.enforce_cert_reload_dn_verification
How does that sound as a potential approach?
Hi @parislarkins, the approach you suggested of bypassing the issuerDN check is not ideal. In this case, a self-signed certificate can also be renewed, which is not trusted. I am working on a fix that will perform issuer DN validation by building a chain of trust with an in-memory trust store(However, there will be one limitation with this approach: what if the root certificate itself gets rotated?).
Hi @shubhmkr-amazon, thanks for getting back to me!
While the point you make about self-signed certificates is true, I'm unsure why this is a particular concern given that there is no such validation when OpenSearch is restarted.
My understanding is that rotating certs (either by a full restart or hot-reload) requires updating the certs on disk and then being a super admin or having the ability to restart OpenSearch. If the concern is that a malicious actor could trigger a hot-reload to change to a self-signed cert, then I think it likely that they would have the ability to restart OpenSearch as well, given they must have access to the OpenSearch server in order to update the cert on disk.
Would you be able to clarify what the concern is with being able to hot-reload to a potentially self-signed cert (on an opt-in basis), given this is allowed by restarting OpenSearch?
Thanks!
I've raised https://github.com/opensearch-project/security/pull/4752 as a potential fix for this issue, to allow disabling the Distinguished Names validation when performing a hot-reload. In our case we do not think the validation is providing us additional security value, so are happy to disable it. There may be other use cases where this is not appropriate though, so the fix suggested by @shubhmkr-amazon may be the best option there, even if it doesn't work for root CA rotation.
@cwperks this bug needs reopening since the PR above was reverted in https://github.com/opensearch-project/security/pull/4840
@perios101 it has now been re-merged by #4841
What is the bug? As part of the process using the REST API to hot reload transport and http certificates, we receive a 500 error.
The old certificate (and all previous ones to date) has been issued by Let's Encrypt's R3 CA (which seems to have been alive and well as DR CA) and the new one by R11 CA. Same scenario is happening on an environment where intermediate CA changed from R11 to R10. Both are covered by parent ISRG Root X1 CA (along with the old R3).
Restarted individual pods automatically pick up new certificate as valid without issue however hot-reloading existing instances throws the same error over and over despite both CAs are Let's Encrypt issuers covered by root CA.
How can one reproduce the bug? Steps to reproduce the behavior:
What is the expected behavior? Certificate validation should succeed as both certificates were issued via valid CA - pinning to same CA as the old one is not considered good practice, especially in DR scenarios (yes, Issuers have those too).
What is your host/environment?
Do you have any screenshots? Exception:
java.io.IOException: OpenSearchSecurityException[Error while initializing transport SSL layer from PEM: java.lang.Exception: New Certs do not have valid Issuer DN, Subject DN or SAN.]; nested: Exception[New Certs do not have valid Issuer DN, Subject DN or SAN.];
Do you have any additional context?