Operator stop `CheckAndHeal` when redis pod not all running

drivebyer commented 1 year ago

Expected behaviour

I have a cluster with three sentinels and three Redis instances. The Redis instances use local persistent volumes (PV) to store data, which means that the pods can only be scheduled to their original nodes. The master is deployed with one sentinel on the same node, referred to as node-A.

When node-A is restarted, the sentinels can execute a failover for the master. The client connects to the sentinel and retrieves the address of the latest master.

Actual behaviour

the operator print following logs:

DEBU[0859] Number of redis mismatch, waiting for redis statefulset reconcile  namespace=default redisfailover=redisfailover src="handler.go:79"

App client got follow error:

The error log comes from: https://github.com/spotahome/redis-operator/blob/632aa3da88ee2a28ccb9fc4f571ddcb556dbd713/operator/redisfailover/checker.go#L101-L104

Maybe we should continue reconcile the cluster after (2*n +1)/2 count of instance failed.

drivebyer commented 1 year ago

cc @ese

github-actions[bot] commented 11 months ago

This issue is stale because it has been open for 45 days with no activity.

drivebyer commented 11 months ago

not stale

github-actions[bot] commented 9 months ago

This issue is stale because it has been open for 45 days with no activity.

drivebyer commented 9 months ago

not stale

github-actions[bot] commented 8 months ago

This issue is stale because it has been open for 45 days with no activity.

drivebyer commented 8 months ago

not stale

drivebyer commented 8 months ago

this issue maybe partially addressed by https://github.com/spotahome/redis-operator/pull/558

spotahome / redis-operator

Operator stop `CheckAndHeal` when redis pod not all running #657

Expected behaviour

Actual behaviour