Open andrey-dubnik opened 2 years ago
For context, in cass-operator there's forceRackUpgrade parameter that allows fixing this type of issues. Perhaps we should look into something similar (but not that approach since then the operator modifies the spec).
@andrey-dubnik Since you mentioned having to update the STS I assume you are referring to Cassandra pods. In this scenario, k8ssandra-operator should apply the change to the underlying CassandraDatacenter. cass-operator however will not apply the changes to the STS until all the Cassandra pods are in the ready state. That's been the behavior in cass-operator for as long as I have been involved with the project. With that said, I am not a fan of it and think we should consider changing it. @burmanm wdyt?
What is missing?
Currently when there is a change in the K8ssandraCluster configuration breaking the deployment e.g. excessive CPU request prevents POD from booting leaving it in Pending, Operator does not apply a newly updated K8ssandraCluster definition with a fix. To fix the problem we have to update the STS definition to reduce the CPU requested so POD could start and only after that Operator reconciles the new configuration.
Why do we need it?
There are situations where POD won't boot and we like to fix it via the K8ssandraCluster CRD
Environment
┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: K8OP-218