k8ssandra / cass-operator

The DataStax Kubernetes Operator for Apache Cassandra
https://docs.datastax.com/en/cass-operator/doc/cass-operator/cassOperatorGettingStarted.html
Apache License 2.0
187 stars 66 forks source link

Cassandra cannot become ready after configuration change #695

Open kos-team opened 2 months ago

kos-team commented 2 months ago

What happened?

We tried to change the configuration of an existing Cassandra cluster, by changing the cassandra-yaml.num_tokens from 16 to 8. The operator proceeds to update the StatefulSets of the racks, by changing the arguments of the Cassandra Pods. However, the restarted Pods are stuck at Unready state. The readiness probe keeps returning 500 errors.

What did you expect to happen?

The operator should be able to update the Cassandra configuration correctly. A restart of the Pod is not the right procedure for changing the num_tokens of Cassandra. To properly change the num_tokens, the operator needs to decommission the node and let the node to rejoin the cluster with the updated num_tokens configuration.

How can we reproduce it (as minimally and precisely as possible)?

  1. Deploy the cass-operator
  2. Deploy CassandraDB with the following CR yaml
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: test-cluster
spec:
  clusterName: development
  serverType: cassandra
  serverVersion: "4.1.2"
  managementApiAuth:
    insecure: {}
  size: 3
  storageConfig:
    cassandraDataVolumeClaimSpec:
      storageClassName: standard
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
  racks:
    - name: rack1
  config:
    jvm-server-options:
      initial_heap_size: "1G"
      max_heap_size: "1G"
    cassandra-yaml:
      num_tokens: 16
      authenticator: PasswordAuthenticator
      authorizer: CassandraAuthorizer
      role_manager: CassandraRoleManager
  1. Change the cassandra-yaml.num_tokens from 16 to 8 in config:
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: test-cluster
spec:
  clusterName: development
  serverType: cassandra
  serverVersion: "4.1.2"
  managementApiAuth:
    insecure: {}
  size: 3
  storageConfig:
    cassandraDataVolumeClaimSpec:
      storageClassName: standard
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
  racks:
    - name: rack1
  config:
    jvm-server-options:
      initial_heap_size: "1G"
      max_heap_size: "1G"
    cassandra-yaml:
      num_tokens: 8
      authenticator: PasswordAuthenticator
      authorizer: CassandraAuthorizer
      role_manager: CassandraRoleManager
  users:
  - secretName: demo-secret
    superuser: true
  1. Observe the restarted Pods keep returning 500 errors to readiness probe

cass-operator version

1.22.0

Kubernetes version

1.29.1

Method of installation

Helm

Anything else we need to know?

No response

┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: CASS-2

burmanm commented 2 months ago

num_tokens modification is not allowed by Cassandra itself, so this feature is not about cass-operator.

The proper way to modify num_tokens would be to create separate datacenter as having different number on different nodes is not really recommended.

Automated decommission on config change is not currently on our radar as a feature as that could cause unintended data loss and availability issue.