Open swoehrl-mw opened 1 year ago
i'm not sure if OpenSearch Security already has this feature (the documentation for OpenSearch is still incomplete), but at least Search Guard supports TLS certificate hot reloading. if OpenSearch Security supports this (or support is added for it) then the operator could use the hot reload API to trigger the re-load.
do you have plans on how to handle the rollover of the CA (which will also expire at some point)?
IMHO this correlates a bit with #141 as cert-manager could take care of some things (though the triggering of the hot reload would still have to be done as cert-manager isn't aware of OpenSearch)
@rursprung
then the operator could use the hot reload API to trigger the re-load.
Sounds like a good idea. Although we might still need to implement a restart variant for older opensearch versions.
do you have plans on how to handle the rollover of the CA (which will also expire at some point)?
No idea yet. Maybe something where a new CA is generated ahead of time and the certificate is signed by both CAs for a time to give services to switch out their CA. Depends a bit on if clients actually use the CA cert to verify connections. Suggestions are always welcome.
Depends a bit on if clients actually use the CA cert to verify connections. Suggestions are always welcome.
AFAIK the nodes do for node-to-node communication. not sure about clients (i guess "it depends" is the proper answer, though i'd expect that they do by default nowadays)
Proposed Solution: We will create a secret based on the existing flag which will control whether we need per node certificate or a single certificate for all nodes. In the case of a single certificate for all nodes, we will create a certificate object and then map the secret created using that object into the Opensearch custom resource. In the case of per node certificate, we will generate multiple certificates using and merge them into a single secret let's say node-cert-merged(using custom code and adding watcher in the same), as there will be any certificate changes, we will sync the node-cert-merged.
Solutions:
and merge them into a single secret
i'd suggest to ask some security experts for their opinion on this. i doubt that they'll be happy with the private key for one node being visible to another node. i don't think that it's super critical in this case, but it definitely goes against the best practices for private/public key usage (where you never, ever give anyone else access to your private key).
Hi, Thanks for thinking about this issue. This is a very important topic for us since an expired certificate will break the whole OS cluster. Our current workaround is to create certificates with long expiry dates manually, but since transport encryption is only within the cluster, I would like to do no manual steps at all. When using the certificates generated by the operator, is there a way to trigger a renew manually? Like for example removing config from OpenSearchCluster manifest (so that demo certificates are used) and then adding it again?
PS: Hi @swoehrl-mw we worked together a long time ago at MW, nice to see you again :)
Hi @Alwinius
is there a way to trigger a renew manually?
Without having tested it: If you delete the <cluster-name>-transport-cert
and <cluster-name>-http-cert
secrets the operator should generate new ones during the next reconcile run (so after 30 seconds). Afterwards you would need to get the operator to do a rolling restart (for example by adding a dummy change to the config). Theoretically this should work without downtime.
we worked together a long time ago at MW, nice to see you again :)
The world feels small ;-)
Hi @swoehrl-mw
Afterwards you would need to get the operator to do a rolling restart (for example by adding a dummy change to the config). Theoretically this should work without downtime.
Kubernetes tracks the change in secret and updates the volume automatically
From k8s documentation: When a volume contains data from a Secret, and that Secret is updated, Kubernetes tracks this and updates the data in the volume, using an eventually-consistent approach
The operator should check during reconcile runs if any certificates are about to expire and renew them if needed.
Doing this alone should be fine I feel
@Gokul-Radhakrishnan
If we enable hot-reload of certs (which AFAIK is disabled by default) then yes, just updating the secrets should be enough.
Any movement on this?
Setup a new cluster using our PKI, following this.
I set it up a couple days ago, and went to make a change this morning but the node I restarted would not come up with errors like.
javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
[ERROR][o.o.s.s.t.SecuritySSLNettyTransport] [mycluster-masters-2] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
Even though it has mounted the new cert. In order to get the everything up and running all nodes need to be restarted.
Using these certs
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: opensearch-certs-pki
namespace: opensearch
spec:
secretName: opensearch-certs-pki
privateKey:
size: 2048
algorithm: RSA
encoding: PKCS8
dnsNames:
- mycluster
- mycluster-masters-0
- mycluster-masters-1
- mycluster-masters-2
- mycluster-bootstrap-0
- mycluster-discovery
- mycluster.opensearch
- mycluster.opensearch.svc
- mycluster.opensearch.svc.cluster.local
usages:
- key encipherment
- server auth
- client auth
commonName: Opensearch_Node
issuerRef:
group: certmanager.step.sm
kind: StepClusterIssuer
name: step-issuer
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: opensearch-admin-certs-pki
namespace: opensearch
spec:
secretName: opensearch-admin-certs-pki
privateKey:
size: 2048
algorithm: RSA
encoding: PKCS8
commonName: OpenSearch_Admin
usages:
- key encipherment
- server auth
- client auth
issuerRef:
group: certmanager.step.sm
kind: StepClusterIssuer
name: step-issuer
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: opensearch-dashboards-certs-pki
namespace: opensearch
spec:
secretName: opensearch-dashboards-certs-pki
privateKey:
size: 2048
algorithm: RSA
encoding: PKCS8
dnsNames:
- mycluster-dashboards
usages:
- key encipherment
- server auth
- client auth
issuerRef:
group: certmanager.step.sm
kind: StepClusterIssuer
name: step-issuer
And this config
---
apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
name: mycluster
namespace: opensearch
spec:
security:
tls: # Everything related to TLS configuration
transport:
generate: false
perNode: false
secret:
name: opensearch-certs-pki
nodesDn: ["CN=Opensearch_Node", ]
adminDn: ["CN=OpenSearch_Admin", ]
http:
generate: false
secret:
name: opensearch-certs-pki
config:
adminSecret:
name: opensearch-admin-certs-pki
securityConfigSecret:
name: securityconfig-secret
adminCredentialsSecret:
name: mycluster-admin-password
general:
serviceName: mycluster
version: 2.10.0
setVMMaxMapCount: true
dashboards:
enable: true
opensearchCredentialsSecret:
name: mycluster-admin-password
tls:
enable: true
generate: false
secret:
name: opensearch-dashboards-certs-pki
version: 2.10.0
replicas: 2
For now, I am going to test with this: https://github.com/stakater/Reloader
my certificates expired today :(
Without having tested it: If you delete the
<cluster-name>-transport-cert
and<cluster-name>-http-cert
secrets the operator should generate new ones during the next reconcile run (so after 30 seconds). Afterwards you would need to get the operator to do a rolling restart (for example by adding a dummy change to the config). Theoretically this should work without downtime.
@KannappanSomu could you perhaps confirm this approach as a feasible workaround until automatic cert-renewal is implemented?
@KannappanSomu could you perhaps confirm this approach as a feasible workaround until automatic cert-renewal is implemented?
This worked in my cluster. I still had to scale down the operator to update (recreate) the statefulset with spec.podManagementPolicy: Parallel
though. That's bug #685.
@asturm-fe Works for my cluster too. thanks !
This really seems like something that should have a clear warning in the docs: Like, your cluster will stop working after exactly one year.
Or at least put a warning that this project is far from mature if something critical like this can go unsolved for more than a year after reporting..
@jonathon2nd any updates ?
@swoehrl-mw
If you delete the
-transport-cert and -http-cert secrets the operator should generate new ones during the next reconcile run (so after 30 seconds).
The admin certs expire as well right? Don't we need to regenerate them as well? With them expired, Security APIs that require admin cert auth might not work, right?
The admin certs expire as well right? Don't we need to regenerate them as well? With them expired, Security APIs that require admin cert auth might not work, right?
@AniketKariya You are correct, the admin cert needs to be recreated as well, otherwise the securityconfig-update job would not work.
I have created this repo as a temporary (but stable) help while we wait for an official implementation in the operator: https://github.com/flavienbwk/opensearch-k8s-certmanager
It explains how to setup cert-manager + Reloader with ready-to-deploy examples. Might also help for #141.
The operator can generate its own self-signed certificates to use for the opensearch pods. However the operator does not have functionality to renew the certificates once they expire after a year.
The operator should check during reconcile runs if any certificates are about to expire and renew them if needed. After renewal the operator needs to do a rolling restart of the opensearch pods so they pick up the new certificates.