redis / lettuce

Advanced Java Redis client for thread-safe sync, async, and reactive usage. Supports Cluster, Sentinel, Pipelining, and codecs.
https://lettuce.io
MIT License
5.4k stars 975 forks source link

Clients unable to recover Cluster failover issue when connecting with multiple Redis clusters #1398

Open umang92 opened 4 years ago

umang92 commented 4 years ago

Bug Report

Current Behavior

We have the following setup. Client code creates two RedisClusterClient objects to connect with different Redis Clusters at the same time.

We are using 2 AWS Elasticache clustered mode Redis, 6 shards. 1 Replica node for each shard.

Now we initiate a shard failover in one of the shards of one of the two Redis Clusters. What we observe is that as soon as the failover is initiated the client application starts getting RedisCommandTimeoutException, which is expected. But it is unable to recover from these errors and they keep showing up in large numbers even after 15-20 minutes. The system recovers only after restarting the client process.

We have tested the exact same scenario with the client application connecting to a single Redis Cluster. In this case, the client is able to recover from the RedisCommandTimeoutExceptions within 1 minute of initiating the failover.

We are using following Redis commands in our setup: a. rpush b. lpop c. pexpire

I am providing the client code in the Input code section. Please note that while moving from a single Redis Cluster to multiple Redis clusters, no client code changes were made.

#### Input Code
Input Code ```java //startup code RedisURI.Builder builder = RedisURI.Builder.redis(redisDetails.getHost(), redisDetails.getPort()); logger.info("Enable SSL"); builder=builder.withSsl(Boolean.TRUE); builder=builder.withPassword(properties.getRedisPassword()); RedisURI redisURI = builder.withTimeout(Duration.ofSeconds(5)).build(); redisClusterClient = RedisClusterClient.create(redisURI); if (redisClusterClient == null) { logger.info("Could not create Redis connection."); throw new Exception("Could not create redis connection."); } else logger.info("Redis connection created successfully."); ClusterTopologyRefreshOptions topologyRefreshOptions = ClusterTopologyRefreshOptions.builder() .enablePeriodicRefresh(Duration.ofSeconds(properties.getRedisTopologyRefreshInterval())) .build(); // periodic refresh interval is set to 15 seconds redisClusterClient.setOptions(ClusterClientOptions.builder() .topologyRefreshOptions(topologyRefreshOptions) .build()); GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig(); poolConfig.setMaxTotal(properties.getMaxRedisConnections()); poolConfig.setMaxIdle(properties.getMaxRedisConnections()); poolConfig.setMinIdle(properties.getMinRedisConnections()); GenericObjectPool> pool = ConnectionPoolSupport .createGenericObjectPool(() -> redisClusterClient.connect(new StreamRedisCodec()), poolConfig); //per request code try { StatefulRedisClusterConnection connection = pool.borrowObject(); connection.sync().rpush(......) } catch(Exception ex){ throw new IOException(ex); } ```
#### Expected behavior/code #### Environment - Lettuce version: 5.3.0.RELEASE - Redis version: AWS elasticache Redis 5.0.6 #### Possible Solution #### Additional context
mp911de commented 4 years ago

Each RedisClusterClient has its own topology refresh and set of connections it manages. You'd probably need to enable debug logs to trace topology updates and check the topology state for each client whether it reflects the most recent changes.

tishun commented 3 months ago

@umang92 is this issue still relevant?