Open andrey-dubnik opened 2 years ago
cass-operator does not allow the superuser secret reference to be updated to point to a different secret. We should allow the secret reference to change in order to better support rotating and changing credentials. I will create an issue in cass-operator for this.
k8ssandra-operator does allow the reference to be changed which means we unfortunately have some inconsistent behavior.
I have some questions for your to help with the investigation.
Did you deploy the operators with their validating webhooks? If so you should see something like this:
kubectl get validatingwebhookconfigurations
NAME WEBHOOKS AGE
cass-operator-validating-webhook-configuration 1 7d4h
cert-manager-webhook 1 7d4h
k8ssandra-operator-validating-webhook-configuration 1 7d4h
I ask because the validating webhook for cass-operator prevents the superuser secret reference from being updated. The error you hit happens during reconciliation in cass-operator which means after the webhook runs.
Can you describe the topology for your K8ssandraCluster. Is it deployed across multiple k8s clusters, multiple namespaces?
Is the new secret in the same namespace as the K8ssandraCluster? It needs to be in the same namespace as the K8ssandraCluster.
Can you check and see if the secret has these labels:
app.kubernetes.io/managed-by: k8ssandra-operator
k8ssandra.io/cluster-name: <your-cluster-name>
k8ssandra.io/cluster-namespace: <your-cluster-namespace>
The secret needs to have those labels in order to get replicated to the namespaces/k8s clusters where the CassandraDatacenters are deployed. The error you reported is triggered when cass-operator cannot find the secret. This makes me wonder if the secret was replicated. Note that k8ssandra-operator should able the labels to your secret.
Here are the hooks I have which does have the k8ss hook in place
kubectl get validatingwebhookconfigurations
NAME WEBHOOKS AGE
actions-runner-controller-validating-webhook-configuration 3 132d
aks-node-validating-webhook 1 72d
cert-manager-webhook 1 132d
elastic-operator.temporal-visibility.k8s.elastic.co 10 26d
gatekeeper-validating-webhook-configuration 2 198d
k8ssandra-k8ssandra-operator-validating-webhook-configuration 1 23d
kube-prometheus-stack-admission 1 149d
When we initially deployed the k8ss v1 we have use pre-loaded secrets matching the cluster name to drive the password. With v2 we have figured we can now use the provided secret but this happened after we have it deployed.
When we deployed the provided secret we have also deleted the original which was matching the cluster name as thought it was not necessary which have triggered the error of the missing secret.
Surprisingly the secret was updated in few places and failed to update in one, e.g. below is the new secret data which was updated
initContainers:
- args:
- /bin/sh
- -c
- echo "$SUPERUSER_JMX_USERNAME $SUPERUSER_JMX_PASSWORD" >> /config/jmxremote.password
&& echo "$REAPER_JMX_USERNAME $REAPER_JMX_PASSWORD" >> /config/jmxremote.password
env:
- name: SUPERUSER_JMX_USERNAME
valueFrom:
secretKeyRef:
key: username
name: cassandra-superuser
- name: SUPERUSER_JMX_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: cassandra-superuser
...
superuserSecretName: dev-westeurope-01-superuser
status:
as we were using the original secret from v1 and loaded it in v2 our cluster name matching secret didn't had those labels... we can add them if this is needed.
And yes - the secret we have is in the same namespace with both the operator and cassandra cluster. We only have one cassandra cluster per k8s cluster at the moment.
Based on what you reported the cass-operator webhook, cass-operator-validating-webhook-configuration
, is not deployed. That makes sense given the original error you hit. How did you install k8ssandra-operator? I am curious how you wound up with the k8ssandra-operator webhook deployed but not the cass-operator one.
Go ahead and add the labels to the secret.
Can you share your K8ssandraCluster spec?
We use flux to deploy things
This is a normal helm release deployment which seem to be all default values
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: k8ssandra
namespace: temporal-state
spec:
releaseName: k8ssandra
interval: 5m
chart:
spec:
chart: k8ssandra-operator
version: "=0.37.3"
sourceRef:
kind: HelmRepository
name: k8ssandra
namespace: temporal-state
values:
cass-operator:
resources:
requests:
cpu: 50m
memory: 50Mi
limits:
cpu: 50m
memory: 50Mi
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 500m
memory: 256Mi
The cluster specs are following
apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
name: ${CASSANDRA_CLUSTER_NAME}
namespace: temporal-state
spec:
reaper:
cassandraUserSecretRef:
name: cassandra-reaper-cql
jmxUserSecretRef:
name: cassandra-reaper-jmx
uiUserSecretRef:
name: cassandra-reaper-ui
medusa:
cassandraUserSecretRef:
name: cassandra-medusa
storageProperties:
storageProvider: azure_blobs
storageSecretRef:
name: medusa-azure-credentials
bucketName: cassandra-backups
cassandra:
serverVersion: "4.0.3"
superuserSecretRef:
name: cassandra-superuser
datacenters:
- metadata:
name: ${CASSANDRA_DATACENTER}
labels:
env: ${ENVIRONMENT_NAME}
app: temporal
product_id: service-composition
provider: azure
region: westeurope
k8s_cluster: ${CLUSTER_NAME}
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9103'
telemetry:
prometheus:
enabled: true
size: 3
storageConfig:
cassandraDataVolumeClaimSpec:
storageClassName: cassandra-csi
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Ti
resources:
requests:
cpu: 2000m
memory: 10Gi
limits:
cpu: 3500m
memory: 10Gi
config:
jvmOptions:
heapSize: 8G
gc: G1GC
gc_g1_rset_updating_pause_time_percent: 5
gc_g1_max_gc_pause_ms: 300
racks:
- name: az-1
nodeAffinityLabels:
cassandra-rack: az1
- name: az-2
nodeAffinityLabels:
cassandra-rack: az2
- name: az-3
nodeAffinityLabels:
cassandra-rack: az3
As all our secrets are in the same namespace there is no issue with the secrets, there was only the issue when I dropped the one matching DC name cause I thought it is no longer needed as I have updated the DC CDR with a new reference. Which secret do we need to add the labels for - the one matching DC name or the new one?
I thought it is no longer needed as I have updated the DC CDR with a new reference
I think I better understand. You need to make the change through the K8ssandraCluster. You should not directly modify the CassandraDatacenter object. The CassandraDatacenter is created/updated by k8ssandra-operator based on the K8ssandraCluster spec.
When you directly update the CassandraDatacenter, that will trigger a reconciliation in k8ssandra-operator. k8ssandra-operator will see the desired state for the CassandraDatacenter (as determined by the K8ssandraCluster spec) does not match the actual state. It will then update the CassandraDatacenter with the desired state which means you will lose your changes.
Lastly, I misspoke early when I say k8ssandra-operator allows you to change the superuser secret. It performs a check during reconciliation and will end the reconciliation with an error if you change the superuser. I will create separate ticket for this.
Some clarification from my end to avoid confusion. I did not actually directly updated the Cassandra DC (it was tempting) cause this would likely have some consequences if done bypassing the k8ss.
What I actually did is updated the k8ss cluster object and dropped the DC named secret which resulted in the error message. Once I got the error message I have put the old secret back (just the secret without any CRD update) and reconciliation completed without any issues. K8ss cluster object still have a new super user secret in the CRD.
After that sequence was completed the cass operator DC object have both the old one and the new super user secret reference in 2 places.
I'm still having a bit of trouble following :( Can you list out steps to reproduce? Then I will test.
Here is the sequence, let me know if anything else needs clarifying
everything is done in the single namespace
I am interested in how to do this secret reassignment properly. At this stage, I don't mind having to entirely redo the k8ssandra-operator specification and deployment. If there is a way to change the cassandra secret(s) after the fact, that would be useful.
I have an app-stack that wants to take over the Cassandra cluster and regard the cluster as its own with its own secret for administration of Cassandra services. Being able to change the name and password of that Cassandra admin would be convenient.
Hi @lonniev,
we yet have to implement proper secret rotation. Right now you have to rotate manually as the operator won't update the credentials in Cassandra. This is definitely on our roadmap though.
What happened?
We would like to change the superuser secret reference from default
dev-westeurope-01-superuser
to cassandra-superuser (password will remain the same, we just wanted to have manual secret)We have added a block below which works for the new builds
What we found is this block does not let us change the existing secret reference
We have checked the CassandraDatacenter object reference and indeed this is not updated
Did you expect to see something different?
I would expect the superuser reference to be updated
How to reproduce it (as minimally and precisely as possible):
Create a cluster with a default superuser secret Update the cluster to use a different superuser secret
Environment
┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: K8OP-170