Describe the bug
Deployment architecture:
We are running a two replica deployment (one master.one replica) in the cloud on Kubernetes (Statefulset).
Each pod in our deployment runs on a dedicated instance (VM).
We have three pods:
Node-0
Node-1
Sentinel-0
Both node-0 (master) and node-1 (replica) are running a Redis server and a Sentinel process, and the Sentinel-0 only runs a Sentinel process.
We have a total of 3 Sentinels with a quorum of 2.
In our pipeline we are running a test that does the following:
1 – restart node-0 (kubectl delete ---grace-period 60) (master) and check if data persists and node-1 became master.
2- restart sentinel-0 and check if data persist
3 – restart node-1 (master at this point) and check if data persisted and node-0 became master
At this point (node-0 is master and node-1 is replica)
4 – stop all nodes (VM) (each pod runs on a node)
5 – start all nodes and check if data persists
During the steps (1-4) we are running a while loop that reads and writes to test zero downtime.
Steps 1-4 go as expected, but after starting all instances again which start up in this order : Sentinel-0 -> node-0 and node-1 (at the same time, sometimes node-1 starts before node-0 and vice versa) we get the split brain issue, where Sentinel-0 says that node-1 is master and node-1 and node-0 say that node-0 is master.
We tried waiting to see if there is eventual consistency but that is not the case.
A short description of the bug.
To reproduce
Follow steps 1 to 5.
Steps to reproduce the behavior and/or a minimal code sample.
Expected behavior
The expected behavior is that there is eventual consistency if there is a disagreement.
Additional information
The config files node.conf and sentinel.conf for the instances:
node_and_sentinel.docx
Describe the bug Deployment architecture: We are running a two replica deployment (one master.one replica) in the cloud on Kubernetes (Statefulset). Each pod in our deployment runs on a dedicated instance (VM). We have three pods: Node-0 Node-1 Sentinel-0 Both node-0 (master) and node-1 (replica) are running a Redis server and a Sentinel process, and the Sentinel-0 only runs a Sentinel process. We have a total of 3 Sentinels with a quorum of 2. In our pipeline we are running a test that does the following:
1 – restart node-0 (kubectl delete ---grace-period 60) (master) and check if data persists and node-1 became master. 2- restart sentinel-0 and check if data persist 3 – restart node-1 (master at this point) and check if data persisted and node-0 became master At this point (node-0 is master and node-1 is replica) 4 – stop all nodes (VM) (each pod runs on a node) 5 – start all nodes and check if data persists
During the steps (1-4) we are running a while loop that reads and writes to test zero downtime. Steps 1-4 go as expected, but after starting all instances again which start up in this order : Sentinel-0 -> node-0 and node-1 (at the same time, sometimes node-1 starts before node-0 and vice versa) we get the split brain issue, where Sentinel-0 says that node-1 is master and node-1 and node-0 say that node-0 is master. We tried waiting to see if there is eventual consistency but that is not the case.
A short description of the bug.
To reproduce Follow steps 1 to 5. Steps to reproduce the behavior and/or a minimal code sample.
Expected behavior The expected behavior is that there is eventual consistency if there is a disagreement.
Additional information The config files node.conf and sentinel.conf for the instances: node_and_sentinel.docx
The logs for the sentinels before stopping the VM's: before-node-0.log before-node-1.log before-sentinel-0.log
The logs for the sentinels after starting the VM's: after-node-0.log after-node-1.log after-sentinel-0.log
Any additional information that is relevant to the problem.