Open c4milo opened 1 month ago
So this is a bit nastier than I thought. I was under the impression that we could just yet force
in the upgrade spec but the operator itself won't update the helm release if it sees that it's unhealthy which further makes this difficult to get out of.
For reference, this is how to set force but it doesn't really do anything given the operator's behavior.
chartRef:
upgrade:
force: true
I'd vote to change the behavior to just always update the helm release regardless of it's existing status as that'll prevent users from fixing forward. @RafalKorepta WDYT?
Agree with you @chrisseto
What happened?
Whenever a configuration change results in a redpanda pod falling into an unschedulable or crashloop state. It is impossible to correct the situation by only fixing the CR values. The values are taking but they are not reconciled by the operator and the statefulset remains using the wrong configuration.
See screen recording in: https://redpandadata.slack.com/archives/C01H6JRQX1S/p1723752154395579?thread_ts=1723751900.722069&cid=C01H6JRQX1S
What did you expect to happen?
If we make mistakes configuring container resources and/or limits in the Redpanda Custom Resource (CR), or any other configuration resulting in a broker crashlooping. We want to be able to correct it through the Redpanda CR and see the change instantly applied by the operator. No delays.
How can we reproduce it (as minimally and precisely as possible)?. Please include values file.
Anything else we need to know?
No response
Which are the affected charts?
Operator
Chart Version(s)
Cloud provider
JIRA Link: K8S-323
JIRA Link: K8S-324