Closed brokoli18 closed 3 years ago
Hi, this looks to be the same issue as I'm having https://github.com/redis/redis/issues/8786.
Do you know this is recent? I've only ever ran this setup in 6.2.1.
I am running 6.0.6 at the moment, I think older versions I ran had the same problem but Im not sure about newer ones. Our issues are the same so I will close this issue so that that all discussion can happen in one place
I am not sure if this is a bug or expected behaviour so I am raising this as a question.
I have redis deployed in a 3 node cluster, with each node containing redis-server and redis-sentinel on their default ports. The configuration management is all automated and the services are deployed with a base config.
Recently we have been having issues with failover, where despite the fact that 1 node in the cluster is offline the sentinels will refuse to elect a new master and continue to postpone the election. After drilling down into the issue I can see that this is caused by duplicate entries for known-sentinels appearing in the sentinel config:
Those entries refer to the same sentinel, which had its config reset/changed by the config management system(wiping its id). When it comes back the other sentinels make a new entry for it but keep the old one too changing the port.
I understand that sentinels will not remove any already discovered members when they go offline to preserve the topology: https://github.com/redis/redis/issues/1972. However this case is slightly different as the same sentinel (same port and ip) rejoins the cluster. Is there a reason why this entry is not replaced/edited instead of adding an entirely new line?
If there is a reason is this behaviour documented anywhere? I cant find a reference to this in the official sentinel docs (https://redis.io/topics/sentinel). I imagine other people would run afoul of this issue if the sentinels are deployed with a default/minimal configuration (https://bugs.launchpad.net/kolla-ansible/+bug/1788179).