spotahome / redis-operator

Redis Operator creates/configures/manages high availability redis with sentinel automatic failover atop Kubernetes.
Apache License 2.0
1.5k stars 356 forks source link

Sentinel Readiness is ready when really not #532

Closed cbuckley01 closed 1 year ago

cbuckley01 commented 1 year ago

Expected behaviour

The K8s sentinel service should not include endpoints (sentinels) that have local master, 127.0.0.1

Actual behavior

The readiness check will gladly put a sentinel's endpoint into the service even though it is not registered with the cluster yet

Steps to reproduce the behaviour

Get lucky and point an app at the k8s sentinel service when a new sentinel pod just comes up and watch the app fail trying to connect to master @ 127.0.0.1

The bug/fix should be pretty simple, the readiness check needs to ensure that the sentinel is indeed connected to a cluster of more than itself at the very least, something like this:

if [ `redis-cli -p 26379 sentinel CKQUORUM mymaster | awk '{print $2}'`  -gt 1 ] ; then exit 0; fi

I can submit a PR, but wondering how deep you want to go, make the readiness check for sentinel a configurable option, changing the CRD etc... (ugh) or just use an iteration of the above as the baked in check?

samof76 commented 1 year ago

@cbuckley01 I suppose this is the right thing to do. I guess all the probe on both redis & sentinel should somehow move to the CRDs, like the shutdownConfig, which provides with greater flexibility but have sane defaults if there is nothing user provided.

cc: @ese

ese commented 1 year ago

Thanks @cbuckley01, a PR is very welcome. IMHO better left it simple, so until there is no clear statement to have a custom readiness probe its enough to have the opinionated operator one. Edit the current one is already a great improvement https://github.com/spotahome/redis-operator/blob/master/operator/redisfailover/service/generator.go#L512

samof76 commented 1 year ago

@ese @cbuckley01 I would differ in opinion... Its always better to have custom probes in the CRD. Which include the following.

readiness - redis
startup - redis
liveness - redis
readiness - sentinel
startup - sentinel
liveness - sentinel

to override the defaults if they are specified otherwise have the defaults.

Two gains we get by doing this.

  1. Testing an configs is much more easier
  2. The changes will not restart all deployments, and affects only those which have the CRD changes.
github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 45 days with no activity.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale.