spotahome / redis-operator

Redis Operator creates/configures/manages high availability redis with sentinel automatic failover atop Kubernetes.
Apache License 2.0
1.48k stars 355 forks source link

Service rfs-redis is not updated in case of network partition on sentinel #663

Closed cjabrantes closed 7 months ago

cjabrantes commented 9 months ago

Hi,

Thanks for your work on this operator. I was running some tests and notice in the following behaviour:

With a network policy i dropped the traffic to/from one of the sentinels.

Expected behaviour

My expected behaviour would be that that sentinel would be removed from the list of available endpoints from service rfs-redis.

From my understanding if a new slave is elected during this time, 2 sentinels would see the new one and 1 sentinel still points to the old one, which can be problematic since in redis client we just configure rfs-redis and so having the change of hitting in the "bad" sentinel.

Actual behaviour

if we check the service endpoints we see all of them there:

Endpoints: 10.42.3.36:26379,10.42.4.42:26379,10.42.5.129:26379

Looking to the cmd in probing:

redis-cli -h rfs-redis-69c47c54fc-6kbv4 -p 26379 sentinel get-master-addr-by-name mymaster 1) "10.42.3.50" 2) "6379"

Still have success.

I also notice that the following cmd shows status down: redis-cli -p 26379 info | grep -i status master0:name=mymaster,status=sdown,address=10.42.3.50:6379,slaves=1,sentinels=3

So maybe we can use it also in probing:

redis-cli -h $(hostname) -p 26379 sentinel get-master-addr-by-name mymaster | head -n 1 | grep -vq '127.0.0.1' && redis-cli -p 26379 info | grep -i status=ok

Steps to reproduce the behaviour

In kubernetes just add a networkpolicy/firewall rule to prevent traffic to one sentinel.

Environment

Setup: 3 sentinels 2 redis

Operator 1.2.4 Kubernetes v1.25.4

Let me know your thoughts.

Thanks, Carlos

cjabrantes commented 9 months ago

Update: I m testing 1.3.0RC1 that supports now customReadinessProbe

customReadinessProbe:
  exec:
    command: ["sh","-c", "redis-cli -h $(hostname) -p 26379 sentinel get-master-addr-by-name mymaster | head -n 1 | grep -vq '127.0.0.1' && redis-cli -p 26379 info | grep -i status=ok"]

With that probe the sentinel in the minor partition is removed from the service endpoints

github-actions[bot] commented 7 months ago

This issue is stale because it has been open for 45 days with no activity.

github-actions[bot] commented 7 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.