Open rhuffy opened 1 year ago
I assume this is the same as https://github.com/k8ssandra/cass-operator/issues/130 ?
@burmanm, it seems like a different (although somewhat related) issue. Here the nodes won't refuse to start, which is apparently what's described in #130. I'm not sure how the operator could detect that 🤔 The other nodes are the ones ignoring the node that inherited an old IP, so that node cannot tell (or can it?) that it's getting ignored. Unless we can detect some schema update failures in the mgmt-api and bounce the node so that it gets a new IP?
What happened?
While reading through open Cassandra issues, I came across CASSANDRA-17883. The issue is that, when a C node is removed, its IP address gets added to a list of ignoredEndpoints in MigrationCoordinator. In the C source, there is a TODO comment that describes the issue:
When a pod bounces and comes up with a different IP, the old IP is removed from gossip, and I believe it's also added to ignoredEndpoints. If another pod bounces and gets that original IP, my concern is that any schema changes on that node will be ignored by the rest of the cluster.
Does the operator do anything to handle this situation?
What did you expect to happen?
No response
How can we reproduce it (as minimally and precisely as possible)?
I don't have a repro on a test k8s cluster since I'm not sure how to force pods to come up with particular IPs.
You can, however, reproduce in Cassandra dtests with these steps
Note that if node1 is restarted with some new IP, it will receive the schema change from node2, and pass it along to node3.
cass-operator version
1.15.0
Kubernetes version
1.24
Method of installation
No response
Anything else we need to know?
No response
┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: CASS-22