redisson / redisson

Redisson - Easy Redis Java client and Real-Time Data Platform. Valkey compatible. Sync/Async/RxJava/Reactive API. Over 50 Redis or Valkey based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Bloom filter, Spring, Tomcat, Scheduler, JCache API, Hibernate, RPC, local cache...
https://redisson.pro
Apache License 2.0
23.26k stars 5.34k forks source link

redis集群某节点主备切换后,报Node for slot: 1888 hasn't been discovered yet #3346

Closed hyuans closed 3 years ago

hyuans commented 3 years ago

redis集群某节点发生主备切换后,开始报以下错误

error.log中报 org.redisson.client.RedisNodeNotFoundException: Node for slot: 1888 hasn't been discovered yet. Check cluster slots coverage using CLUSTER NODES command. Increase value of retryAttempts and/or retryInterval settings. at org.redisson.connection.MasterSlaveConnectionManager.createNodeNotFoundFuture(MasterSlaveConnectionManager.java:578) at org.redisson.connection.MasterSlaveConnectionManager.connectionReadOp(MasterSlaveConnectionManager.java:562) at org.redisson.command.RedisExecutor.getConnection(RedisExecutor.java:648) at org.redisson.command.RedisExecutor.execute(RedisExecutor.java:116) at org.redisson.command.RedisExecutor$2.run(RedisExecutor.java:244) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672) at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747) at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)

info.log中报(几分钟会报一次) 2021-01-15 11:19:16,155 [redisson-netty-5-5] INFO o.r.cluster.ClusterConnectionManager.shutdownEntry(ClusterConnectionManager.java:254) - /172.28.9.49:9004 master and related slaves: [addr=redis://172.28.9.48:9001] removed 2021-01-15 11:19:16,155 [redisson-netty-5-5] INFO o.r.cluster.ClusterConnectionManager.checkSlotsMigration(ClusterConnectionManager.java:701) - 5461 slots removed from redis://172.28.9.49:9004 2021-01-15 11:19:16,677 [AsyncResolver-bootstrap-executor-0] INFO c.n.d.s.r.aws.ConfigClusterResolver.getClusterEndpoints(ConfigClusterResolver.java:43) - Resolving eureka endpoints via configuration 2021-01-15 11:19:46,256 [redisson-netty-5-3] INFO o.r.c.p.MasterPubSubConnectionPool.lambda$run$0(ConnectionPool.java:167) - 1 connections initialized for /172.28.9.49:9004 2021-01-15 11:19:46,264 [redisson-netty-5-6] INFO o.r.c.pool.MasterConnectionPool.lambda$run$0(ConnectionPool.java:167) - 24 connections initialized for /172.28.9.49:9004 2021-01-15 11:19:46,267 [redisson-netty-5-26] INFO o.r.c.pool.PubSubConnectionPool.lambda$run$0(ConnectionPool.java:167) - 1 connections initialized for /172.28.9.48:9001 2021-01-15 11:19:46,290 [redisson-netty-5-16] INFO o.r.cluster.ClusterConnectionManager.lambda$null$5(ClusterConnectionManager.java:325) - slaves: [redis://172.28.9.48:9001] added for slot ranges: [[0-5460]] 2021-01-15 11:19:46,291 [redisson-netty-5-16] INFO o.r.cluster.ClusterConnectionManager.lambda$null$5(ClusterConnectionManager.java:332) - master: redis://172.28.9.49:9004 added for slot ranges: [[0-5460]] 2021-01-15 11:19:46,291 [redisson-netty-5-16] INFO o.r.c.pool.SlaveConnectionPool.lambda$run$0(ConnectionPool.java:167) - 100 connections initialized for /172.28.9.48:9001 2021-01-15 11:19:51,317 [redisson-netty-5-5] INFO o.r.cluster.ClusterConnectionManager.shutdownEntry(ClusterConnectionManager.java:254) - /172.28.9.49:9004 master and related slaves: [addr=redis://172.28.9.48:9001] removed 2021-01-15 11:19:51,318 [redisson-netty-5-5] INFO o.r.cluster.ClusterConnectionManager.checkSlotsMigration(ClusterConnectionManager.java:701) - 5461 slots removed from redis://172.28.9.49:9004 2021-01-15 11:19:56,326 [redisson-netty-5-4] INFO o.r.c.p.MasterPubSubConnectionPool.lambda$run$0(ConnectionPool.java:167) - 1 connections initialized for /172.28.9.49:9004 2021-01-15 11:19:56,334 [redisson-netty-5-7] INFO o.r.c.pool.MasterConnectionPool.lambda$run$0(ConnectionPool.java:167) - 24 connections initialized for /172.28.9.49:9004 2021-01-15 11:19:56,338 [redisson-netty-5-3] INFO o.r.c.pool.PubSubConnectionPool.lambda$run$0(ConnectionPool.java:167) - 1 connections initialized for /172.28.9.48:9001 2021-01-15 11:19:56,365 [redisson-netty-5-17] INFO o.r.cluster.ClusterConnectionManager.lambda$null$5(ClusterConnectionManager.java:325) - slaves: [redis://172.28.9.48:9001] added for slot ranges: [[0-5460]] 2021-01-15 11:19:56,367 [redisson-netty-5-17] INFO o.r.cluster.ClusterConnectionManager.lambda$null$5(ClusterConnectionManager.java:332) - master: redis://172.28.9.49:9004 added for slot ranges: [[0-5460]] 2021-01-15 11:19:56,367 [redisson-netty-5-17] INFO o.r.c.pool.SlaveConnectionPool.lambda$run$0(ConnectionPool.java:167) - 100 connections initialized for /172.28.9.48:9001

redis集群信息CLUSTER NODES如下 CLUSTER NODES d167ab629fd9e804b7b94d250b60634239899a77 172.28.9.48:9002@19002 myself,slave 3aa09cb227f90313f452c399c015767e2735dc19 0 1610699693000 2 connected 496cff30335016523fc8af38b3d2a64608e65afc 172.28.9.49:9003@19003 master - 0 1610699694293 3 connected 5461-10922 ef77423d42bde0d88f123f6bfdd235138b4ba864 172.28.9.48:9001@19001 slave b301296b5218c22332f77b1da719e5bb985170c2 0 1610699694000 9 connected b301296b5218c22332f77b1da719e5bb985170c2 172.28.9.49:9004@19004 master - 0 1610699695295 9 connected 0-5460 eadbd9cc11b85bcf289b8a5963a99dcfff4b292a 172.28.9.50:9006@19006 slave 496cff30335016523fc8af38b3d2a64608e65afc 0 1610699693285 3 connected 3aa09cb227f90313f452c399c015767e2735dc19 172.28.9.50:9005@19005 master - 0 1610699696298 5 connected 10923-16383

Steps to reproduce or test case 网络波动时,某节点发生主备切换,才报以上错误,后续在redis安装的linux 中 ps -ef|grep redis 命令查看 实际redis服务所有节点都在运行,实际没有宕机

后续我将Redisson版本降至3.13.6后,没有报错,恢复正常 Redisson版本在3.10.6也不会报错

Redis version 4.0.14

Redisson version 3.14.1

Redisson configuration

    String redissonNodes = "172.28.9.48:9001,172.28.9.48:9002,172.28.9.49:9003,172.28.9.49:9004,172.28.9.50:9005,172.28.9.50:9006";
    String redissonPassword = "";
    Integer masterConnectionPoolSize = 200;
    Integer slaveConnectionPoolSize = 400;
    Integer slaveConnectionMinimumIdleSize = 100;
    Integer connectTimeout = 10000;
    Integer timeout = 10000;

    String[] hosts = redissonNodes.split(",");
    List<String> nodeList = new ArrayList<String>();
    for (String node : hosts) {
        nodeList.add("redis://" + node);
    }
    Config config = new Config();
    config.setCodec(new org.redisson.client.codec.StringCodec());
    ClusterServersConfig clusterServersConfig = config.useClusterServers().addNodeAddress(nodeList.toArray(new String[nodeList.size()]));
    if (redissonPassword != null && !"".equals(redissonPassword)) {
        clusterServersConfig.setPassword(redissonPassword);
    }
    clusterServersConfig.setMasterConnectionPoolSize(masterConnectionPoolSize);// 设置对于master节点的连接池中最大连接数
    clusterServersConfig.setSlaveConnectionPoolSize(slaveConnectionPoolSize);// 设置对于slave节点的连接池中最大连接数
    clusterServersConfig.setSlaveConnectionMinimumIdleSize(slaveConnectionMinimumIdleSize);
    clusterServersConfig.setConnectTimeout(connectTimeout);
    clusterServersConfig.setTimeout(timeout);
    return Redisson.create(config);
mrniko commented 3 years ago

If you run redisson right after Redis cluster start then wait for couple of seconds for cluster topology update in Redis

xrayw commented 3 years ago

@mrniko I meet the same issue. and never recover by itself until I restart system.

mrniko commented 3 years ago

related to https://github.com/redisson/redisson/issues/3635 please update to 3.16.0