groundhog2k / helm-charts

Helm charts for open source applications - ready to use for deployment on Kubernetes
MIT License
69 stars 56 forks source link

Split brain fix #1231

Closed tomswinkels closed 1 year ago

tomswinkels commented 1 year ago

We found that other charts have made a fix for that: https://github.com/DandyDeveloper/charts/pull/149/files

Can you have a look at this?

tim-hanssen commented 1 year ago

We experienced a issue where (we think after the cluster upgrade) one of the redis instances did not join the rest of the sentinel cluster but started it's own, running in split brain. Restarting that pod fixed the issue, but now we did not detected it before the applicatie was showing some weird responses.

groundhog2k commented 1 year ago

A great. We had that issue very rarely and were not able to force a reproducable scenario. As stated in the PR "Under not entirely known yet circumstances...." 😄 I will have a look at it and try to understand and adapt it for this deployment. Thank you!!!

groundhog2k commented 1 year ago

After reading all the comments in the pull request mentioned above and also the issue https://github.com/DandyDeveloper/charts/issues/121 I focsed on reproducing the scenario - which took quite a while, but I can now say it's possible to reproduce it reliably.

How to reproduce the split-brain scenario:

At first setup a local k3d based Kubernetes cluster with 3 worker (agent) nodes using my K3d-setup helper project. (see documentation in the repo) ./k3d-setup.sh devcluster

Start a local HA-redis cluster with 3 pods spreaded over the available worker nodes using the chart from https://github.com/groundhog2k/helm-charts (I'm sure that other charts have same behavior/problem) with the following values-files:

fullnameOverride: redis-test

podDisruptionBudget:
  minAvailable: 2

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
              matchLabels:
                app.kubernetes.io/instance: redis-test
                app.kubernetes.io/name: redis-test
          topologyKey: kubernetes.io/hostname
        weight: 100

resources:
  limits:
    memory: "128Mi"
    cpu: "100m"
  requests:
    memory: "128Mi"
    cpu: "100m"
sentinelResources:
  limits:
    memory: "64Mi"
    cpu: "100m"
  requests:
    memory: "64Mi"
    cpu: "100m"

haMode:
  ## Enable high availibility deployment mode
  enabled: true
  ## Mandatory redis HA-master group name
  masterGroupName: redistestha
  ## Number of replicas (minimum should be 3)
  replicas: 3
  ## Quorum of sentinels that need to agree that a master node is not available
  quorum: 2
  ## Number of parallel reconfigurations
  parallelSyncs: 1
  ## Number of milliseconds after the master should be declared as unavailable
  downAfterMilliseconds: 30000
  ## Timeout for a failover
  failoverTimeout: 60000
  ## Assumed wait time in seconds until failover should be finished and before failover will be forced (should be greater than value of downAfterMilliseconds)
  failoverWait: 35
  ## Keep old init logs in /data/init.log after a successful initialization (use only for debugging)
  keepOldLogs: true

helm install test groundhog2k/redis -f values.yaml

Wait until all pods are up and running and health status of statefulset is o.k. After the first setup the redis-test-0 instance will be the master and -1 und -2 are the 2 slaves Let's assume redis-test-0 is running on k3d-devcluster-agent-0, redis-test-1 is on k3d-devcluster-agent-1 and redis-test-2 is on k3d-devcluster-agent-2.

Simulate an outage for the first worker node (where redis master is running) by simply stopping it without properly draining it

k3d node stop k3d-devcluster-agent-0

Watch the sentinel logs of the two ramaining redis instances until a new master becomes selected. Lets assume redis-test-2 on k3d-devcluster-agent-2 is now the new master

Now "fix" the failed K8s node by letting it back into the cluster

k3d node start k3d-devcluster-agent-0

When the node is up, the pod with redis-test-0 will reinitialize. Logs will show that the redis-test-0 re-joined the redis-cluster as a slave now. Now let's drop the last selected master (redis-test-2 on k3d-devcluster-agent-2)

k3d node stop k3d-devcluster-agent-2

Wait until the cluster voted and selected a new master. Let's assume redis-test-1 became the new master now. Let the previously stopped worker node back into the cluster

k3d node start k3d-devcluster-agent-2

Wait until the pod reinitializes and look at the logs.

The sentinels of redis-test-0 and redis-test-1 will not show any additional output Sentinel log of redis-test-2 will show that it declared itself as a new master. redis-init container shows this too!

There are 2 masters now in the cluster redis-test-1 and redis-test-2

This was reproducible multiple times. Two things are important:

A) It will only start to happen with the second vote, never at the first one. So two relocation process of the redis master are needed for that. B) The Redis-instance or cluster node must lose contact to the other nodes/instances without warning (no clean kubectl drain ... etc.)

It seems only to happen when redis-test-0 (the first instance) WAS NOT the master. (Reminder to myself: needs to be checked again)


Update on A) It sometimes could also need 3 relocations of the master.

@tomswinkels / @tim-hanssen: After a few days off I will continue investigating that a bit deeper. I would like to have a solution without a sidecar container and maybe there is a way to better detect this "event" during initialization of a redis instance - and maybe it's a bug we need to address to the Redis folks.

groundhog2k commented 1 year ago

As I found out it's a DNS resolve problem during startup of the pod when the node was unreachable before. That will now be handled during redis-init. It will be detected when the service endpoint can not be resolved and this leads to a restart of the pod after a short wait. Afterwards the service is reachable and normal initialization continues. Before this change, the DNS problem was ignored and it was assumed that this pod is the only/first one and it took the master role.

I will do more tests next week before I finally merge it

@tomswinkels @tim-hanssen Please have a look at this PR, do your own tests and give me feedback, please.