tongdun / td-redis-operator

一款强大的云原生redis-operator,经过大规模生产级运行考验,支持分布式集群、支持主备切换等缓存集群解决方案…The powerful cloud-native redis-operator, which has passed the test of large-scale production-level operation, supports distributed clusters and active/standby switching ...
Apache License 2.0
505 stars 89 forks source link

(异常场景测试)redis集群写入数据后,重启redis pod后,新pod依旧有槽位信息,operator就会判断pod一直在集群中,无法加入redis 集群 #26

Open styshoo opened 11 months ago

styshoo commented 11 months ago

测试步骤: 1、创建redis集群; 2、向redis集群写入一些数据; 3、重启部分pod,新重启的pod里,会被判定为有槽位信息(podInCluster函数),那么这些pod就不会被加入集群;

重启pod的redis相关日志如下:

1:M 25 Jul 2023 02:28:25.269 * DB loaded from disk: 0.000 seconds
1:M 25 Jul 2023 02:28:25.269 # I have keys for unassigned slot 3300. Taking responsibility for it.
1:M 25 Jul 2023 02:28:25.272 * Ready to accept connections

虽然nodes.conf是未被持久化的,但启动后,还是会被redis给写入槽位数据:

# cat /home/admin/redis/nodes.conf 
9ee910e301fef3ba741011d2e472379daa262f9a :6379@16379 myself,master - 0 0 0 connected 3300
vars currentEpoch 0 lastVoteEpoch 0
styshoo commented 11 months ago

原有的代码逻辑,重启后的pod,槽位数据为空,podInCluster就会判定该pod未被加入集群,从而触发对应的redis meet,将节点加入集群中。 但这种场景下下,slots不为空,就不会再加入集群中了。 另外,在加入Endpoint时,也会使用podInCluster。