opensearch-project / opensearch-k8s-operator

OpenSearch Kubernetes Operator
Apache License 2.0
381 stars 201 forks source link

updateStrategy OnDelete on the nodes statefulset causes revision mismatch #310

Open marevers opened 1 year ago

marevers commented 1 year ago

We are using Prometheus and kube-state-metrics to monitor our cluster. One of the alert rules we use monitors the amount of current replicas vs. ready replicas of StatefulSets.

PromQL is as follows: kube_statefulset_status_replicas_ready / kube_statefulset_status_replicas_current != 1

Seeing as spec.updateStrategy.type gets set to OnDelete and not RollingUpdate, we see the following in the status object:

status:
  observedGeneration: 9
  replicas: 2
  readyReplicas: 2
  updatedReplicas: 2
  currentRevision: opensearch-masters-5fdb995b4
  updateRevision: opensearch-masters-7fd87c8d8b
  collisionCount: 0
  availableReplicas: 2

currentRevision and updateRevision are not equal, and as such kube_statefulset_status_replicas_current reports 0 while it should be 2.

I have tried to manually delete the pods, but this seems not to change anything. Because of this there is a false positive alert from that alert rule. More information here: https://github.com/kubernetes/kube-state-metrics/issues/1324

Is there a specific reason spec.updateStrategy.type is set to OnDelete? By my understanding, setting that to RollingUpdate should fix the issue.

swoehrl-mw commented 1 year ago

Hi @gk-mevers. updateStrategy is set to OnDelete to allow the operator to execute rolling restarts and upgrades in a controlled fashion. This allows the operator to do node drains before restarting or upgrading a node and it can also wait for cluster health. It basically moves control over when to replace a pod from kubernetes control plane to the operator. Using RollingUpdate would not give us that level of control. I was able to reproduce your observations. Taking the info from the link you posted I believe that kubernetes does not automatically update the currentRevision to updateRevision even if all pods are up-to-date. I think to facilitate this we need to extend the operator to update the revision for a statefulset after it has completed its work.

I'll mark this ticket as an enhacement. Should you have the time and inclination to have a go at it, PRs are always welcome.

ibotty commented 9 months ago

This is fixed by #614, right?