spotahome / redis-operator

Redis Operator creates/configures/manages high availability redis with sentinel automatic failover atop Kubernetes.
Apache License 2.0
1.49k stars 357 forks source link

Different RedisFailover's sentinels join together #550

Closed tparsa closed 1 year ago

tparsa commented 1 year ago

Environment

How are the pieces configured?

samof76 commented 1 year ago

Do you mean different the sentinel from a different A redis-failover are joining sentinels from B redis-failover? In which case can please answer the following?

  1. Are the failovers running on different running on different the namespaces?
  2. Are the redis's configured with different the port?
tparsa commented 1 year ago

Yes for example A Redis-failover sentinels are joining B Redis-failover sentinels. In some cases because one Redis has a password and the other does not the actual Redis pods can not join together thus the data will not combine.

  1. We witnessed both within the same namespace and between namespaces.
  2. No all of our redises are using the default 6379 port.
tparsa commented 1 year ago

When this happens the operator doesn't have any way to fix it. I checked the metric exposed on this line and the sentinel is unhealthy. Furthermore, I checked the metric on this line as well and found out that operator is sending a reset to sentinels. So I tried to check the process manually and sent a SENTINEL RESET * to all of my 3 sentinels (Sentinels of RedisFailover A). Still, nothing happened since there were 9 sentinels in total (3 for RedisFailover B and 3 for RedisFailover C that all 9 joined together).

To give you more context we have over 90 RedisFailovers in our Production cluster and the cluster's node's CPUs are under heavy pressure. Sentinels from different RedisFailovers joining together is a huge problem for us before our current cluster we fixed any kind of problem by using Network Policies to prevent different Redises and Sentinels to connect to eachother.

raghu-nandan-bs commented 1 year ago

@tparsa is there a deterministic way to reproduce the issue? Also, do they all monitor the same master when this happens?

tparsa commented 1 year ago

I can't provide any way to reproduce the issue. But here is the sentinels.conf of one of the following sentinels:

rfs-click-redis-64cb975df9-sjc8h 2/2     Running                0               7d8h    10.0.38.85    c28-s17   <none>           <none>
rfs-click-redis-64cb975df9-vlnwc 2/2     Running                0               30h     10.0.26.26    c28-n15   <none>           <none>
rfs-click-redis-64cb975df9-wxvmz 2/2     Running                0               4d5h    10.0.42.144   c28-n22   <none>           <none>
> kubectl exec -it rfs-click-redis-64cb975df9-sjc8h -- sh
Defaulted container "sentinel" out of: sentinel, sentinel-exporter, sentinel-config-copy (init)
/data $ cat /redis/sentinel.conf
sentinel myid 5e8a4531d95ac54825fea8bd4babac46ff6f9429
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 10.0.60.6 6379 2

# Generated by CONFIG REWRITE
port 26379
user default on nopass ~* +@all
dir "/data"

sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 10000
sentinel config-epoch mymaster 25
sentinel leader-epoch mymaster 0

sentinel known-replica mymaster 10.0.19.221 6379
sentinel known-replica mymaster 10.0.46.29 6379
sentinel known-replica mymaster 10.0.59.123 6379
sentinel known-replica mymaster 10.0.55.45 6379
sentinel known-replica mymaster 10.0.42.137 6379
sentinel known-replica mymaster 10.0.56.109 6379
sentinel known-replica mymaster 10.0.35.189 6379
sentinel known-replica mymaster 10.0.42.195 6379
sentinel known-replica mymaster 10.0.39.186 6379
sentinel known-replica mymaster 10.0.38.253 6379
sentinel known-replica mymaster 10.0.29.215 6379
sentinel known-replica mymaster 10.0.8.220 6379
sentinel known-sentinel mymaster 10.0.33.88 26379 10037ad576a264e5ae6eb2c45efe6c0ad2704296
sentinel known-sentinel mymaster 10.0.55.146 26379 ade9feaff56c96dc7a8a6e3dda9e3a34420d41db
sentinel known-sentinel mymaster 10.0.40.77 26379 3c97521013d35fb69d5291f993e28d3b23e67da1
sentinel known-sentinel mymaster 10.0.42.144 26379 0752b9327a04e18dc72dbb65a34f3263265b5774
sentinel known-sentinel mymaster 10.0.35.4 26379 1bd1d0e46cb85a8c39db17cecee2303a65f41ae2
sentinel known-sentinel mymaster 10.0.9.51 26379 1825f95bf43d033613f990930012c177533d8dc9
sentinel known-sentinel mymaster 10.0.45.62 26379 55bf4c72c16f9829dbf1adfd1abf36f283bb7517
sentinel known-sentinel mymaster 10.0.26.26 26379 a157b90702bdaf027423b2c096602c35e1196d36
sentinel current-epoch 26

Replicacount of this RedisFailover is 3 sentinels and 5 redises but as you can see there are 12 (4 * 3) slaves and 8 other sentinels which indicates that 3 different RedisFailovers joined together.

raghu-nandan-bs commented 1 year ago

@tparsa do all the sentinels point to same master, in their config?

tparsa commented 1 year ago

I assume they do because what I see in the monitoring is only one of the three RedisFailovers has a master but sometimes there is a glitch that they do have masters. The following images are indicators of which Redis is the master. Dark green is master and light green is slave. image image image

zekena2 commented 1 year ago

I'm trying to upgrade to redis operator more than 1.0 so we can upgrade k8s to 1.22 and what happened is something similar. I tried the latest version first and what happened with me is the new redis operator chooses redis 6.2 by default so all redis replicas and sentinels got changed at the same time after they settled down some replicas didn't get synced into any clusters and I've two big clusters with 36 sentinels and 27 replicas. When I downgraded the operator to 1.1.1 The replicas that weren't ready got ready but still I've the same situation two big clusters with 36 sentinels and more replicas. Note: operator live in a different namespace than the failovers and all failover objects live in one namespace.

zekena2 commented 1 year ago

I still have the situation going on in a test env and I've no idea how to fix it except with removing redis failovers and applying them agian one by one but I'm not sure if it'll help. Now I'm trying to search for a better way to do the migration if anyone can suggest something I'd be thankful.

raghu-nandan-bs commented 1 year ago

@zekena2 In your test env, could you please confirm if all unruly sentinels point to same master? Also, could you please share operator logs for the same?

kubectl logs -f <operator-pod> | grep "<affected redis failover object name>"
raghu-nandan-bs commented 1 year ago

also, which version of operator are you using currently?

zekena2 commented 1 year ago

Currently I'm using v1.2.4. using this command I can confirm that almost all sentinels point to the same master except three of the same failover point and some random ones that points to localhost or literally an ip that doesn't belong to any pod.

kubectl get pods | awk '$1 ~ /^rfs/ { print $1 }' | xargs -I {} -n 1 sh -c "echo \"{}\" && kubectl exec -it pod/{} -c sentinel -- redis-cli -p 26379 sentinel get-master-addr-by-name mymaster"

This is the logs for one of the failovers. I also get the same logs mentioned in this issue. redisoperator.log

raghu-nandan-bs commented 1 year ago

Thanks for the details @zekena2

raghu-nandan-bs commented 1 year ago

@zekena2 are the failovers running in bootstrap mode?

raghu-nandan-bs commented 1 year ago

Also, can you please confirm if you dont see the issue in v.1.2.3?

raghu-nandan-bs commented 1 year ago

@zekena2 could you please share logs with debug mode enabled?

zekena2 commented 1 year ago

We don't use any bootstraping. I'm trying v1.2.3 but I can't see any improvments. How can I enable debug mode?

raghu-nandan-bs commented 1 year ago

@zekena2 thanks for checking; please try --log-level=debug

tparsa commented 1 year ago

What I saw in the operator metrics, indicates that the operator does realize there is a problem with the number of sentinels. But the only fix that the operator does is sending a SENTINEL RESET * that doesn't fix anything.

raghu-nandan-bs commented 1 year ago

@tparsa please do share debug logs if you have the setup running.

zekena2 commented 1 year ago

here's debug logs.

redisoperator.log

tparsa commented 1 year ago

@raghu-nandan-bs I'm afraid of restarting the operator :)) It might cause more sentinels to join together. It can be catastrophic for us.

zekena2 commented 1 year ago

I can confirm that deleting and creating the failovers fix the problem but this isn't really a good solution or an upgrade strategy.

tparsa commented 1 year ago

@zekena2 Also deleting all sentinel pods will fix the problem as well.

zekena2 commented 1 year ago

@tparsa I can confirm that it helps, Thanks for the tip.

EladDolev commented 1 year ago

we suffer from the exact same issue one Redis cluster may be working just fine, while other clusters are joining together to form a one big cluster

using K8s 1.23 and redis-operator v1.2.4

on some K8s clusters we have ~10 RedisFailover which are working perfectly fine all of our instances are always in the same namespace

tparsa commented 1 year ago

We solved our problem by using network policies denying all connections except the ones we needed.

But there is a problem somewhere; this shouldn't be the ultimate solution.

raghu-nandan-bs commented 1 year ago

@tparsa @EladDolev I checked this issue again today, tried forcefully reproducing the issue by setting all redis instances and sentinel instances to be consider a different failover's instance as master. that way, redis operator will be in a state where it wont fix the issue. while we can try updating the operator code to mitigate this, it would be really nice to understand how the redisfailovers ended up in this state in the first place...

@EladDolev if you are still having this issue, could you please help by giving some info -

zekena2 commented 1 year ago

I'm still facing this issue whenever we make a node rotation for upgrading, we put an alert to mitigate it manually for now. In my case custom config is only these two:

customConfig:
  - "down-after-milliseconds 2000"
  - "failover-timeout 3000"

So I guess deleting multiple different sentinels at the same time causes it?

raghu-nandan-bs commented 1 year ago

@zekena2 thanks for the inputs... Few more questions:

zekena2 commented 1 year ago

1- I've 16 failovers every one of them has 3 sentinels and 3 replicas, total 96 all running in the same namespace. 2- The address pool has 1140 free IP addresses it's mostly like that all the time but the aws-vpc-cni reuses IP address that was freed recently by another deleted pod in order to save time and API calls. 3- It might happen that both redis and sentinels in the same node but I didn't investigate if that's the case when we face the issue of sentinels joining each other.

zekena2 commented 1 year ago

Regarding the sentinels and replicas being on the same node, I checked from KSM and it happens to be one of the clusters that joined eachother recently had a sentinel and a replica on the same node before the node rotation but the other cluster didn't.

EladDolev commented 1 year ago

hi @raghu-nandan-bs

are there any custom redis/sentinel configurations used?

sure. we're using some custom configurations for redis

    - appendonly no
    - hz 100
    - save ""
    - maxmemory 0
    - maxmemory-policy noeviction

for sentinel

    - down-after-milliseconds 3000
    - failover-timeout 120000

we're running on GKE, and we're currently witnessing this issue happens on a cluster that is running on spots the way we're spreading the workloads, for each cluster the pods would be spread on different nodes, but one worker node is likely to contain both replica and sentinel of the same cluster, and other pods that belong to different RedisFailover

raghu-nandan-bs commented 1 year ago

Thanks for sharing the details @EladDolev , will take some time to test and get back with code changes, will update here...

zekena2 commented 1 year ago

I think the operator should move away from using IP addresses and utilize the feature in redis 7.0 with using hostnames instead that will eliminate a lot of the possible bugs, but that's probably will require a huge rewrite.

samof76 commented 1 year ago

The best thing to do in this case is to use custom ports for individual deployments, and keep them unique per failover deployment. We do this consciously in our redis failover deployments.

The other way is to instead of using mymaster for every sentinel may be we use a unique name per deployment. This would require a code change.

I see this issue cannot be fixed by any code change on the operator side as the sentinels inherent property is to remember it's redis. Your best shot currently is to use custom, unique, port. But if you are using default port the second option wherein the code change is required to replace the mymaster with a unique name per redisfailover deployment.

samof76 commented 1 year ago

On second thought we should fix the mymaster thing in any case.

tparsa commented 1 year ago

This does not help our case. Pods were not restarted, so using the same IP was not the problem in our case.

tparsa commented 1 year ago

Different ports will fix it I'm not saying it won't but we're not finding the root cause. In our case, at least the RedisFailover that lost master (joined to another RedisFailover) didn't have any pod IP changes.

zekena2 commented 1 year ago

I see this issue cannot be fixed by any code change on the operator side as the sentinels inherent property is to remember it's redis.

I'm pretty sure I never saw this issue in v1.0 there's definitely a regression happened somewhere.

tparsa commented 1 year ago

One way that fixes the issue but is not a good way to fix it is to delete all sentinel pods and simultaneously set a master in RedisFailover. Then when the sentinels are running, the operator will set the right pod to monitor.

mpihlak commented 1 year ago

We just recently had a case where two sentinels "joined together".

There was some event on Kubernetes that caused a bunch of Redis and Sentinel pods to reschedule (ie. a node was scaled down). During rescheduling one of the Redis pod IP addresses was reused by a Redis pod in another cluster.

In the sentinel logs for cluster A, ther was this:

1:X 01 Mar 2023 10:24:39.931 * +fix-slave-config slave 10.178.10.102:6379 10.178.10.102 6379 @ mymaster 10.178.13.24 6379

Here the master 10.178.13.24 belongs to cluster A, but the redis pod IP 10.178.10.102 belongs to cluster B. It had belonged to cluster A just like a minute ago, but the pod was rescheduled and got a new IP. Also a pod in another cluster was rescheduled at the same time and was assigned the IP 10.178.10.102.

There was also a corresponding log entry in the Redis Operator logs, that points this to SetMasterOnAll:

time="2023-03-01T10:24:39Z" level=info msg="Making pod rfr-foo-redis-2 slave of 10.178.13.24" namespace=default redisfailover=foo-redis service=redis.healer src="checker.go:198"

(note: should really log the IP there as well).

Looking at SetMasterOnAll, it looks like it is open to race conditions -- if the pod IP addresses change between fetching them and calling MakeSlaveOfWithPort it could do the wrong thing there. I can imagine this happening more often with Kubernetes API calls taking a long time to complete (ie. overloaded API server).

It seems that setting different auth credentials to different clusters would mitigate this. Possibly having different master ID and not using mymaster everywhere.

zekena2 commented 1 year ago

It became eaiser to reproduce thanks to @mpihlak discovery. We will need calico cni in order to reuse the IP addresses consistently using the IPPool feature.

Start the cluster with minikube using this command minikube start --network-plugin=cni --cni=calico --kubernetes-version=v1.24.11 --driver=docker

Install the operator with whatever way you want. I simply ran helm install redis-operator redis-operator/redis-operator after adding the repo.

Apply the failovers and the IPPools.

apiVersion: databases.spotahome.com/v1
kind: RedisFailover
metadata:
  name: foo
spec:
  sentinel:
    replicas: 3
    resources:
      requests:
        cpu: 100m
      limits:
        memory: 100Mi
  redis:
    podAnnotations:
      'cni.projectcalico.org/ipv4pools': '["new-pool1"]'
    replicas: 3
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
      limits:
        cpu: 400m
        memory: 500Mi
---
apiVersion: databases.spotahome.com/v1
kind: RedisFailover
metadata:
  name: bar
spec:
  sentinel:
    replicas: 3
    resources:
      requests:
        cpu: 100m
      limits:
        memory: 100Mi
  redis:
    podAnnotations:
      'cni.projectcalico.org/ipv4pools': '["new-pool2"]'
    replicas: 3
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
      limits:
        cpu: 400m
        memory: 500Mi
---
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
  name: new-pool1
spec:
  blockSize: 30
  cidr: 10.0.0.0/30
  ipipMode: Never
  natOutgoing: true
---
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
  name: new-pool2
spec:
  blockSize: 30
  cidr: 10.0.0.4/30
  ipipMode: Never
  natOutgoing: true

Now change the annotations in the failovers to point to the other IPPool

apiVersion: databases.spotahome.com/v1
kind: RedisFailover
metadata:
  name: foo
spec:
  sentinel:
    replicas: 3
    resources:
      requests:
        cpu: 100m
      limits:
        memory: 100Mi
  redis:
    podAnnotations:
      'cni.projectcalico.org/ipv4pools': '["new-pool2"]'
    replicas: 3
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
      limits:
        cpu: 400m
        memory: 500Mi
---
apiVersion: databases.spotahome.com/v1
kind: RedisFailover
metadata:
  name: bar
spec:
  sentinel:
    replicas: 3
    resources:
      requests:
        cpu: 100m
      limits:
        memory: 100Mi
  redis:
    podAnnotations:
      'cni.projectcalico.org/ipv4pools': '["new-pool1"]'
    replicas: 3
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
      limits:
        cpu: 400m
        memory: 500Mi
---
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
  name: new-pool1
spec:
  blockSize: 30
  cidr: 10.0.0.0/30
  ipipMode: Never
  natOutgoing: true
---
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
  name: new-pool2
spec:
  blockSize: 30
  cidr: 10.0.0.4/30
  ipipMode: Never
  natOutgoing: true

The operator will start restarting the redis replicas and the sentinels will start to join eachother as the ip addresses will be switched between the two clusters. Note it will happen most of the time but not everytime(for me it happened 4 out of 5) and that proves more that there's a race condition.

githubixx commented 1 year ago

Hi! As I was thinking about using this operator I came across this issue and I felt immediately remembered about the problems I had getting this "fixed" with Google cloud running Redis Sentinel clusters on VMs. We have around 20 of these running and from time to time you of course want to upgrade or install OS updates. Since we do immutable infrastructure we just don't update VMs (via Ansible, Chef, Puppet, ...) but we completely replace them.

All our Redis clusters have three instances. So if we update such a cluster we shutdown one Redis node (which always contains one Redis and Sentinel process). We build our OS image with Hashicorp Packer. So our automation picks up the latest OS image built and then a new VM starts and re-joins the cluster. And then this also happens with the other two remaining nodes. We first replace the two replicas and finally the primary. Before shutting down the primary we issue a primary failover.

As long as you do this only with one Redis Sentinel cluster it normally works fine. But since the update process runs in parallel all 20 Redis Sentinel cluster are recreated at the same time. All that works just fine. But at the beginning we also discovered that some nodes suddenly tried to join other clusters during that process. All clusters have a different redisMasterName configured and therefore the join failed of course. To move that nodes back to the cluster they belong we manually had clean the configuration and rejoin that node.

We tried a lot of workarounds but nothing really worked reliable as Redis/Sentinel always "remembered" the IP addresses if the old (now gone) nodes. And that's actually the problem. During that Redis Sentinel clusters recreation process it was likely that one node got an internal IP address that was previously used by a node that belonged to a different Redis Sentinel cluster.

So our "solution" was to give every VM node fixed internal IP addresses. So they don't change IP addresses as the IP address is allocated once then is then always assigned to the VM that used it before. That "fixed" this issue once and forever :smiley:

But AFAIK you can't do this in Kubernetes. So from what I've read so far in this thread using NetworkPolicy or different ports for every Redis Sentinel deployment might be possible workarounds. Since Redis 6.2 you can also use hostnames instead of IP addresses: https://redis.io/docs/management/sentinel/#ip-addresses-and-dns-names Since Kubernetes has DNS autodiscovery this might be a more general solution to this problem.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 45 days with no activity.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

tparsa commented 1 year ago

Any updates on this?

zekena2 commented 11 months ago

I wanted to check if the new 1.3.0-rc1 might solve it but nothing has changed, clusters still join eachother. @ese Can you please take a look at this bug before the stable release?

dbackeus commented 6 months ago

We just had this happen as well. Three sentinel clusters joined together, possibly connected to some flaky nodes. Lucklily these were all staging deployments. Otherwise that would have caused critical errors and data loss.

sapisuper commented 4 months ago

For everyone facing this problem I suggest to move from this redis-operator to opstree redis-operator. in opstree they have capability to set which sentinel to specific redis server (redis-replication)

    redisReplicationName : [name redis-server]
    masterGroupName: [set sentinelName default mymaster]

this configuration save us. cheers 😃 Doc reference : https://ot-redis-operator.netlify.app/docs/getting-started/sentinel/