k8ssandra / cass-operator

The DataStax Kubernetes Operator for Apache Cassandra
https://docs.datastax.com/en/cass-operator/doc/cass-operator/cassOperatorGettingStarted.html
Apache License 2.0
184 stars 66 forks source link

cass-operator should not modify spec #102

Open arianvp opened 3 years ago

arianvp commented 3 years ago

What is missing?

When setting spec.replaceNodes cass-operator will set it to sepc.replaceNodes = [] after it has started replacing nodes. I would like cass-operator to not do that.

https://github.com/k8ssandra/cass-operator/blob/9c4c3692a90b0199ab002f23fcb08791bf7d7276/operator/pkg/reconciliation/reconcile_racks.go#L1078-L1080

Why do we need it?

This makes cass-operator hard to use with GitOps tools like Flux as it will continously apply the spec; causing a replacement procedure to be triggered over and over.

I guess cass-operator should look at status.NodeReplacements == [] && len(spec.ReplaceNodes ) > 0 to decide whether to start a new replacement instead of at len(spec.ReplaceNodes) > 0

Environment

┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: CASS-62

burmanm commented 3 years ago

Do you mean Flux tries to fight against the cass-operator's modifications? Thus both want to maintain their state in the spec part?

jsanda commented 3 years ago

Thanks for opening this @arianvp :) Unfortunately there are other properties in spec that the are modified:

Might be better to have a separate ticket for each of these and then a "parent" issues. What do you think?

arianvp commented 3 years ago

For replaceNodes I was wondering is it a safe and sound technique to just always start up cassandra with -Dcassandra.replace_address=<previous-pod's-address> ? Just not make this configurable and always see pod restarts as "Replacements"?

If the pod happens to be using something like reattachable storage (EBS) the data is already there and the replacement will be instant. If there is no reattachable storage than it will replicate the data from other nodes.

I don't know enough about cassandra intricacies to tell if that is a good idea. But it would solve the issue at least for replaceNodes

ErickRamirezAU commented 3 years ago

It isn't necessary to use the replace_address flag if the data/ directory is intact. Cassandra is smart enough to know that it's technically the "same" Cassandra node when the PV/storage is attached to a new pod.

For example, let's say Cassandra is running on bare metal servers not in a Kubernetes cluster. If the server experiences a hardware failure, you can simply pull the data disk out of the chassis and plug it into a new bare metal server. Provided that the OS + Cassandra is installed and configured, Cassandra will start up as expected even if the server has a new IP -- Cassandra will handle the change internally.

The same applies to EC2 instances (or any compute instances for other cloud providers). In the case where the EC2 instance dies for whatever reason, you can simply mount the EBS volume to a new EC2 instance and Cassandra will continue to operate as normal. :)

jsanda commented 2 years ago

Please add your planning poker estimate with ZenHub @burmanm

burmanm commented 2 years ago

I think both of these (rollingRestart & replaceNode) could be done as CassandraTasks, which would avoid any Spec changes. As for mentioned "Stopped", it's not cass-operator that modifies it, so that could remain.

We would then deprecate those older Spec properties, but not remove them. If they're not used, then the Spec won't be changed and all should be good.

burmanm commented 2 years ago

ForceUpgradeRacks modifies the Spec also

burmanm commented 7 months ago

After https://github.com/k8ssandra/cass-operator/issues/583 is done, this ticket can be closed as then all the spec modifiers are deprecated.