spotahome / redis-operator

Redis Operator creates/configures/manages high availability redis with sentinel automatic failover atop Kubernetes.
Apache License 2.0
1.51k stars 359 forks source link

Operator stop `CheckAndHeal` when redis pod not all running #657

Closed drivebyer closed 8 months ago

drivebyer commented 1 year ago

Expected behaviour

I have a cluster with three sentinels and three Redis instances. The Redis instances use local persistent volumes (PV) to store data, which means that the pods can only be scheduled to their original nodes. The master is deployed with one sentinel on the same node, referred to as node-A.

When node-A is restarted, the sentinels can execute a failover for the master. The client connects to the sentinel and retrieves the address of the latest master.

Actual behaviour

the operator print following logs:

DEBU[0859] Number of redis mismatch, waiting for redis statefulset reconcile  namespace=default redisfailover=redisfailover src="handler.go:79"

App client got follow error:

截屏2023-09-01 17 31 11

The error log comes from: https://github.com/spotahome/redis-operator/blob/632aa3da88ee2a28ccb9fc4f571ddcb556dbd713/operator/redisfailover/checker.go#L101-L104

Maybe we should continue reconcile the cluster after (2*n +1)/2 count of instance failed.

drivebyer commented 1 year ago

cc @ese

github-actions[bot] commented 11 months ago

This issue is stale because it has been open for 45 days with no activity.

drivebyer commented 11 months ago

not stale

github-actions[bot] commented 9 months ago

This issue is stale because it has been open for 45 days with no activity.

drivebyer commented 9 months ago

not stale

github-actions[bot] commented 8 months ago

This issue is stale because it has been open for 45 days with no activity.

drivebyer commented 8 months ago

not stale

drivebyer commented 8 months ago

this issue maybe partially addressed by https://github.com/spotahome/redis-operator/pull/558