opensearch-project / opensearch-k8s-operator

OpenSearch Kubernetes Operator
Apache License 2.0
366 stars 192 forks source link

Operator 2.5.1 causes frequent restarts and instability #744

Closed mrvdcsg closed 4 months ago

mrvdcsg commented 4 months ago

What is the bug?

The operator restarts frequently causing instability and occasional non-green status which starts auto-healing.

How can one reproduce the bug?

We had many clusters that were running Opensearch 2.4 operator (2.6 opensearch) stable when 2.5.1 was released our system rolled it out (FLUX). Once this upgrade was rolled out we are seeing dozens of operator restarts a day on every cluster and several a day delete the opensearch cluster and it gets rebuilt from scratch.

What is the expected behavior?

Operator would be stable without unexpected restarts and Opensearch Cluster would not be impacted.

What is your host/environment?

We are running this on Azure AKS

Do you have any additional context?

The only other change that happened with this upgrade was addition of the an ISM_Policy. We didn't see instability in 2.4 and I've run some 2.5.1 clusters for testing without issues.

NAME READY STATUS RESTARTS AGE opensearch-operator-controller-manager 2/2 Running 249 (3h17m ago) 19d

mrvdcsg commented 4 months ago

I'm going to close this as I believe the problem is with FLUX attempting to reconcile. 7m33s Warning error helmrelease/opensearch reconciliation failed: install retries exhausted