Open arianvp opened 3 years ago
Do you mean Flux tries to fight against the cass-operator's modifications? Thus both want to maintain their state in the spec part?
Thanks for opening this @arianvp :) Unfortunately there are other properties in spec that the are modified:
Might be better to have a separate ticket for each of these and then a "parent" issues. What do you think?
For replaceNodes
I was wondering is it a safe and sound technique to just always start up cassandra with -Dcassandra.replace_address=<previous-pod's-address>
? Just not make this configurable and always see pod restarts as "Replacements"?
If the pod happens to be using something like reattachable storage (EBS) the data is already there and the replacement will be instant. If there is no reattachable storage than it will replicate the data from other nodes.
I don't know enough about cassandra intricacies to tell if that is a good idea. But it would solve the issue at least for replaceNodes
It isn't necessary to use the replace_address
flag if the data/
directory is intact. Cassandra is smart enough to know that it's technically the "same" Cassandra node when the PV/storage is attached to a new pod.
For example, let's say Cassandra is running on bare metal servers not in a Kubernetes cluster. If the server experiences a hardware failure, you can simply pull the data disk out of the chassis and plug it into a new bare metal server. Provided that the OS + Cassandra is installed and configured, Cassandra will start up as expected even if the server has a new IP -- Cassandra will handle the change internally.
The same applies to EC2 instances (or any compute instances for other cloud providers). In the case where the EC2 instance dies for whatever reason, you can simply mount the EBS volume to a new EC2 instance and Cassandra will continue to operate as normal. :)
Please add your planning poker estimate with ZenHub @burmanm
I think both of these (rollingRestart & replaceNode) could be done as CassandraTasks, which would avoid any Spec changes. As for mentioned "Stopped", it's not cass-operator that modifies it, so that could remain.
We would then deprecate those older Spec properties, but not remove them. If they're not used, then the Spec won't be changed and all should be good.
ForceUpgradeRacks modifies the Spec also
After https://github.com/k8ssandra/cass-operator/issues/583 is done, this ticket can be closed as then all the spec modifiers are deprecated.
What is missing?
When setting
spec.replaceNodes
cass-operator will set it tosepc.replaceNodes = []
after it has started replacing nodes. I would likecass-operator
to not do that.https://github.com/k8ssandra/cass-operator/blob/9c4c3692a90b0199ab002f23fcb08791bf7d7276/operator/pkg/reconciliation/reconcile_racks.go#L1078-L1080
Why do we need it?
This makes
cass-operator
hard to use with GitOps tools like Flux as it will continously apply thespec
; causing a replacement procedure to be triggered over and over.I guess
cass-operator
should look atstatus.NodeReplacements == [] && len(spec.ReplaceNodes ) > 0
to decide whether to start a new replacement instead of atlen(spec.ReplaceNodes) > 0
Environment
Cass Operator version:
**Anything else we need to know?**:Insert image tag or Git SHA here
┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: CASS-62