Closed tomswinkels closed 1 year ago
We experienced a issue where (we think after the cluster upgrade) one of the redis instances did not join the rest of the sentinel cluster but started it's own, running in split brain. Restarting that pod fixed the issue, but now we did not detected it before the applicatie was showing some weird responses.
A great. We had that issue very rarely and were not able to force a reproducable scenario. As stated in the PR "Under not entirely known yet circumstances...." 😄 I will have a look at it and try to understand and adapt it for this deployment. Thank you!!!
After reading all the comments in the pull request mentioned above and also the issue https://github.com/DandyDeveloper/charts/issues/121 I focsed on reproducing the scenario - which took quite a while, but I can now say it's possible to reproduce it reliably.
How to reproduce the split-brain scenario:
At first setup a local k3d based Kubernetes cluster with 3 worker (agent) nodes using my K3d-setup helper project. (see documentation in the repo)
./k3d-setup.sh devcluster
Start a local HA-redis cluster with 3 pods spreaded over the available worker nodes using the chart from https://github.com/groundhog2k/helm-charts (I'm sure that other charts have same behavior/problem) with the following values-files:
fullnameOverride: redis-test
podDisruptionBudget:
minAvailable: 2
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/instance: redis-test
app.kubernetes.io/name: redis-test
topologyKey: kubernetes.io/hostname
weight: 100
resources:
limits:
memory: "128Mi"
cpu: "100m"
requests:
memory: "128Mi"
cpu: "100m"
sentinelResources:
limits:
memory: "64Mi"
cpu: "100m"
requests:
memory: "64Mi"
cpu: "100m"
haMode:
## Enable high availibility deployment mode
enabled: true
## Mandatory redis HA-master group name
masterGroupName: redistestha
## Number of replicas (minimum should be 3)
replicas: 3
## Quorum of sentinels that need to agree that a master node is not available
quorum: 2
## Number of parallel reconfigurations
parallelSyncs: 1
## Number of milliseconds after the master should be declared as unavailable
downAfterMilliseconds: 30000
## Timeout for a failover
failoverTimeout: 60000
## Assumed wait time in seconds until failover should be finished and before failover will be forced (should be greater than value of downAfterMilliseconds)
failoverWait: 35
## Keep old init logs in /data/init.log after a successful initialization (use only for debugging)
keepOldLogs: true
helm install test groundhog2k/redis -f values.yaml
Wait until all pods are up and running and health status of statefulset is o.k. After the first setup the redis-test-0 instance will be the master and -1 und -2 are the 2 slaves Let's assume redis-test-0 is running on k3d-devcluster-agent-0, redis-test-1 is on k3d-devcluster-agent-1 and redis-test-2 is on k3d-devcluster-agent-2.
Simulate an outage for the first worker node (where redis master is running) by simply stopping it without properly draining it
k3d node stop k3d-devcluster-agent-0
Watch the sentinel logs of the two ramaining redis instances until a new master becomes selected. Lets assume redis-test-2 on k3d-devcluster-agent-2 is now the new master
Now "fix" the failed K8s node by letting it back into the cluster
k3d node start k3d-devcluster-agent-0
When the node is up, the pod with redis-test-0 will reinitialize. Logs will show that the redis-test-0 re-joined the redis-cluster as a slave now. Now let's drop the last selected master (redis-test-2 on k3d-devcluster-agent-2)
k3d node stop k3d-devcluster-agent-2
Wait until the cluster voted and selected a new master. Let's assume redis-test-1 became the new master now. Let the previously stopped worker node back into the cluster
k3d node start k3d-devcluster-agent-2
Wait until the pod reinitializes and look at the logs.
The sentinels of redis-test-0 and redis-test-1 will not show any additional output Sentinel log of redis-test-2 will show that it declared itself as a new master. redis-init container shows this too!
There are 2 masters now in the cluster redis-test-1 and redis-test-2
This was reproducible multiple times. Two things are important:
A) It will only start to happen with the second vote, never at the first one. So two relocation process of the redis master are needed for that. B) The Redis-instance or cluster node must lose contact to the other nodes/instances without warning (no clean kubectl drain ... etc.)
It seems only to happen when redis-test-0 (the first instance) WAS NOT the master. (Reminder to myself: needs to be checked again)
Update on A) It sometimes could also need 3 relocations of the master.
@tomswinkels / @tim-hanssen: After a few days off I will continue investigating that a bit deeper. I would like to have a solution without a sidecar container and maybe there is a way to better detect this "event" during initialization of a redis instance - and maybe it's a bug we need to address to the Redis folks.
As I found out it's a DNS resolve problem during startup of the pod when the node was unreachable before. That will now be handled during redis-init. It will be detected when the service endpoint can not be resolved and this leads to a restart of the pod after a short wait. Afterwards the service is reachable and normal initialization continues. Before this change, the DNS problem was ignored and it was assumed that this pod is the only/first one and it took the master role.
I will do more tests next week before I finally merge it
@tomswinkels @tim-hanssen Please have a look at this PR, do your own tests and give me feedback, please.
We found that other charts have made a fix for that: https://github.com/DandyDeveloper/charts/pull/149/files
Can you have a look at this?