opensearch-project / security

🔐 Secure your cluster with TLS, numerous authentication backends, data masking, audit logging as well as role-based access control on indices, documents, and fields
https://opensearch.org/docs/latest/security-plugin/index/
Apache License 2.0
192 stars 273 forks source link

[Feature] SSL hot reload - force cluster to reconnect #1763

Open frotsch opened 2 years ago

frotsch commented 2 years ago

What is the bug? I use the ssl hot reload feature to hot reload new transport certificates which are signed by a different CA. After the hot reload is triggered (and returns successful) there are still the same amount of nodes in the cluster which can not be the case because the node, for which the certs and ca are updated must now not be allowed to join the cluster because of the new ca.

How can one reproduce the bug? Steps to reproduce the behavior:

  1. Check cluster health and look at number_of_nodes
  2. Create new transport certs signed by a new/different ca
  3. Install those on a node (in a cluster with at least 2 nodes)
  4. Trigger transport cert hot reload for this node
  5. Check cluster health and check number_of_nodes (which should now be decreased by one compated to step 1., but its the same)

What is the expected behavior? number_of_nodes decreases by one after the hot reload because the node for which the certs and ca was updated must not be longer allowed to be part of cluster (because the other nodes must not trust the renewed ca)

What is your host/environment?

Do you have any additional context? Looking into the code I did not see anything about disconnecting existing nodes from the cluster when or after hot reload. For any new connection (for example openssl s_client connect to the updated node shown the certificates) its works but it looks to me that the existing tcp connection between nodes are to "reconnected" to that the new certs/ca become effective.

Is this by design or is this a bug? If its not by design I would consider this as a security issue.

peternied commented 2 years ago

[Triage] This is by design, connections to the cluster are not terminated/reconnected during a hot reload.

frotsch commented 2 years ago

Hi @peternied, when its by design I think its a design flaw (and a security flaw, see below). As far as I understood this feature it should be possible to hot reload ssl certs without restarting a node.

Think of the following two situations:

1) You use the hot reload feature because on of the certs are about to expire. Than means that the non-terminated connections will begin to error when the cert expiry becomes effecive. That makes no sense to me because its exacly the behaviour I want to avoid by leveraging a hot reload.

2) You use the hot reload feature because your CA/Keys were compromised. Than means that the non-terminated connections are subsceptible for sniffing/MITM attacks.

And in addition: By not terminating the connections you can not be sure that after a node is really restarted it is able to join the cluster again because you made a mistake with the new certs. If the time between the hot reload and node restart is about month or even years this would be hard to track down.

So I do not see any argument here why such a behaviour would be choosen by design although it is technically not very complicated to trigger a reconnect (happy to provide a PR). So what are the reasons for this design decision? Maybe I overlooked something?

peternied commented 2 years ago

@frotsch Thanks for the thought you've put into this. I think you are right that there should be a way to force all the nodes to reconnect. A pull request would be great place for additional discussion as well.

frotsch commented 2 years ago

@peternied can you reopen the issue?

cliu123 commented 2 years ago

@frotsch Certificate hot reloading itself doesn't break the existing connection between nodes. Given the channel between nodes has been established, the new certificate won't be used until next time establishing channel. Additionally, there are some requirements on the new certificates. The new certificate must have the same issuer DN, the same subject DN and the same subject alternative names (if any).

peternied commented 2 years ago

@frotsch I was doing some digging - forcing the cluster to reload its certs is possible. By doing a hot reload and then making a call to /_opendistro/_security/api/ssl/transport/reloadcerts this scenario is captured. A parameter to ensure after a hot reload the certs are reloaded is useful too, if you'd like to make a pull request.

https://github.com/opensearch-project/security/blob/565f47e804ec03aeeba02ca8def563b91307fcc7/src/main/java/org/opensearch/security/ssl/rest/SecuritySSLReloadCertsAction.java#L41-L53

I'll create a bug that we should have public documentation for this API. Does this resolve your issue?

peternied commented 2 years ago

Documentation bug for follow up https://github.com/opensearch-project/documentation-website/issues/530

cliu123 commented 2 years ago

Security plugin validates certificate to make sure all connections are established with valid certificates, but it doesn't disconnect a connection estabilished with a certificate that is still valid(not expired) no matter whether a new cert gets loaded. If user wants to enforce the connection to pick up the new cert, restarting the node automatically does that.

frotsch commented 2 years ago

@cliu123 @peternied not sure why this was closed again? It makes me feel embarrassed and not taken seriously.

As outlined above restarting a node to pickup new certs is exactly was "hot" reloading is not about. And https://github.com/opensearch-project/security/issues/1763#issuecomment-1110073337 just explains the status quo but does not comment on the issues mentioned in https://github.com/opensearch-project/security/issues/1763#issuecomment-1107443768

I prepared https://github.com/frotsch/security/commit/f3636fa4f512783291e1a4d1321d7fb601207d49 as a starting point and I am ready to open a PR if this is still considered valueable.

It adds a disconnectAfterReload parameter (which is false by default). You can call "_opendistro/_security/api/ssl/transport/reloadcerts?disconnectAfterReload=true to perform a node disconnect after cert reload (like @peternied suggested in https://github.com/opensearch-project/security/issues/1763#issuecomment-1111164213).

cliu123 commented 2 years ago

Hi @frotsch, sorry for the confusion! The TLS cert hot-reloading feature was designed based on the way TLS works. Certificate is only used at the moment of establishing connection. So even a new certificate gets reloaded, it won't be used until establishing a new connection as I mentioned in another comment. And hot-relaoding itself doesn't break existing connection. I misunderstood your request. I thought that you requested to change the current behavior of hot-reloading and make it break existing connect by default.

Your proposal will be a great contribution since it'll proivde a new option without changing the existing default behavior. It's absolutely valuable to enhance the hot-reloading feature. Looking forward to your PR(s)!

cliu123 commented 2 years ago

@frotsch Thanks for bringing up this great idea to enhance the certs hot reloading feature! We understand that you've been working on the changes and ready to create a PR. We're happy to review your PR once it's open. But we're also here if you need any help make the PR.

stephen-crawford commented 1 year ago

[Triage] Hi @frotsch, I am just following up to see if you were still planning on creating a PR for this issue? I see that you had previously mentioned interest in creating a commit and would be happy to review your PR. Thank you.

Hakky54 commented 7 months ago

It is possible to invalidate the ssl session which is cached within the sslcontext. It can be done after the reload, wouldn't that force to reconnect to the nodes with the new certificates?