spotahome / redis-operator

Redis Operator creates/configures/manages high availability redis with sentinel automatic failover atop Kubernetes.
Apache License 2.0
1.5k stars 356 forks source link

Refactor logging to give more visibility to check and heal services #533

Closed ese closed 1 year ago

ese commented 1 year ago

Description

In order to diagnose issues with the operator is more relevant to know what is happening with check and heal. We were logging mostly about kubernetes service performing updates in kubernetes objects which usually is not relevant once the cluster is bootstrapped.

samof76 commented 1 year ago

@ese there seems to be inherent issue with this

Before applying check and heal wait for all expected pods up and running instead wait only for exists to let Kubernetes controllers do their job

Consider this scenario....

  1. The master and sentinel pods are running
  2. The master pod and setinels get killed.
  3. Those pods are unable to get scheduled.

    In this case the check-and-heal would not do what its intended to do.

Consider another scenario...

  1. The master and sentinel pods are running
  2. All of the sentinel get killed and along with a slave
  3. Now the sentinels get scheduled but on slave is still node scheduled.

In this case the check-and-heal would not configure the sentinels because of the fix here.