spotahome / redis-operator

Redis Operator creates/configures/manages high availability redis with sentinel automatic failover atop Kubernetes.
Apache License 2.0
1.49k stars 356 forks source link

Ignore any excess pods in a non-running state when checking if pods are running #640

Closed andrewchinnadorai closed 10 months ago

andrewchinnadorai commented 11 months ago

Fixes https://github.com/spotahome/redis-operator/issues/639

As described in #639 when scheduling the Sentinel pods on preemptible/spot instances, if a node gets terminated due to preemption and a sentinel pod was running on that node a replacement pod will be spun up on a new node and the old pod will shutdown but still be present in the cluster in a Completed state.

The checks performed by IsSentinelRunning get a list of all pods for the deployment indiscriminately and then proceed to check that each pod returned is in a running state. In our case this may mean that we have pods in a Completed state which will fail this check, even though we have the desired number of sentinel pods running in a healthy state. Because of this the cluster & Sentinel are deemed to not be healthy.

There are a couple of ways this could be fixed but in this PR I've done what I think is the simplest fix which is to alter the AreAllRunning function to take a expectedRunningPods integer as a parameter and then altered the loop to increment a runningPods counter in the event the conditional inside the loop if not met, or skip to the next item in the event the conditional is not met therefore not incrementing the runningPods counter. We can then return the boolean value of checking if runningPods is equal to expectedRunningPods. By doing this we no longer care about any other cluster/sentinel pods which may be in a non-running state in the cluster, as long as the expected number of pods are running.

ese commented 10 months ago

Thanks