k8ssandra / management-api-for-apache-cassandra

RESTful / Secure Management Sidecar for Apache Cassandra
Apache License 2.0
70 stars 51 forks source link

Add hot reloading of SslContext #376

Closed burmanm closed 11 months ago

burmanm commented 1 year ago

Create a Filewatcher to detect any changes in the filesystem related to these TLS files (key, CA, cert). If something happens on the disk to these files, simply reload the SslContext.

Also, change v1.2 restriction to Netty defaults (including support for TLS v1.3)

Fixes #375

burmanm commented 12 months ago

No need to test this tomorrow, I need to verify one thing that felt suspicious today in my testing.

burmanm commented 11 months ago

Alright, that was nothing - the operator does indeed create new connection each time, so the new SSL gets loaded. It takes around 5 minutes for Kubernetes to detect that. That is, Netty reruns the:

channelPipeline.addFirst(sslContext.newHandler(ch.alloc()));

On each new connection and our reference to the sslContext has now been updated when the cert gets refreshed.

As there's no logging in this current PR, I know it's a bit tricky to see when it actually does reload something. I added logging to my personal fork, but I'm happy to add here to notify that SSLs have changed. And from that, the process I used to test this manually:

cass-operator has this test, which loads MTLS certs to management-api and verifies it works: https://github.com/k8ssandra/cass-operator/blob/master/tests/test_mtls_mgmt_api/test_mtls_mgmt_api_suite_test.go

The parts in it are what I used. So, I created my own image:

docker buildx build -t michaelburman290/cass-management-api:4.1.2 -f Dockerfile-4_1.ubi8 . --load
kind load docker-image michaelburman290/cass-management-api:4.1.2

And then I modified tests/testdata/oss-one-node-dc-with-mtls.yaml with the following patch:

diff --git a/tests/testdata/oss-one-node-dc-with-mtls.yaml b/tests/testdata/oss-one-node-dc-with-mtls.yaml
index 144c760..92a8051 100644
--- a/tests/testdata/oss-one-node-dc-with-mtls.yaml
+++ b/tests/testdata/oss-one-node-dc-with-mtls.yaml
@@ -5,7 +5,8 @@ metadata:
 spec:
   clusterName: cluster1
   serverType: cassandra
-  serverVersion: "3.11.7"
+  serverVersion: "4.1.2"
+  serverImage: michaelburman290/cass-management-api:4.1.2
   managementApiAuth:
     manual:
       clientSecretName: mgmt-api-client-credentials
@@ -22,6 +23,6 @@ spec:
   racks:
     - name: r1
   config:
-    jvm-options:
+    jvm-server-options:
       initial_heap_size: "512m"
       max_heap_size: "512m"

Then, deploy everything..

kubectl apply -f tests/testdata/mtls-certs-server.yaml
kubectl apply -f tests/testdata/mtls-certs-client.yaml
kubectl apply -f tests/testdata/oss-one-node-dc-with-mtls.yaml

Verify that they work. I then simply modified the mtls-certs-server.yaml with new stuff (using kubectl edit), another cert pairs and checked that the behavior has changed after the reloads have happened. I verified the reloads and new data by going to the pod and checking the filesystem and comparing the logs. After that, let cass-operator try to connect to it.. "whoops", it will fail since it tries to use the old certs-client.yaml stuff and that would no longer match whatever I had in my new ones.