redis / lettuce

Advanced Java Redis client for thread-safe sync, async, and reactive usage. Supports Cluster, Sentinel, Pipelining, and codecs.
https://lettuce.io
MIT License
5.37k stars 962 forks source link

Redis GetConnection stuck waiting and not timing out #2422

Open stefanklas opened 1 year ago

stefanklas commented 1 year ago

Bug Report

We have seen an issue whether thread has got stuck waiting for getConnection future to complete. Looks like it is waiting for method connectClusterAsync to complete but is not timing out.

https://github.com/lettuce-io/lettuce-core/blob/2ad862f5a1db860d57236c21c473cfd9aefebfea/src/main/java/io/lettuce/core/cluster/RedisClusterClient.java#L399

Current Behavior

Stack trace ```java "pool-10-thread-1" id=39 state=WAITING - waiting on <0x5ad3b62d> (a java.util.concurrent.CompletableFuture$Signaller) - locked <0x5ad3b62d> (a java.util.concurrent.CompletableFuture$Signaller) at java.base@11.0.17.0.2/jdk.internal.misc.Unsafe.park(Native Method) at java.base@11.0.17.0.2/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) at java.base@11.0.17.0.2/java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1796) at java.base@11.0.17.0.2/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3128) at java.base@11.0.17.0.2/java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1823) at java.base@11.0.17.0.2/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1998) at app//io.lettuce.core.AbstractRedisClient.getConnection(AbstractRedisClient.java:367) at app//io.lettuce.core.cluster.RedisClusterClient.connect(RedisClusterClient.java:403) at app//io.lettuce.core.cluster.RedisClusterClient.connect(RedisClusterClient.java:378) at app//com.oracle.pic.apigw.state.redis.RedisConnectionHandler.lambda$getClusterConnection$0(RedisConnectionHandler.java:159) at app//com.oracle.pic.apigw.state.redis.RedisConnectionHandler$$Lambda$1153/0x00000008407b4840.apply(Unknown Source) at java.base@11.0.17.0.2/java.util.HashMap.compute(HashMap.java:1229) at app//com.oracle.pic.apigw.state.redis.RedisConnectionHandler.getClusterConnection(RedisConnectionHandler.java:156) at app//com.oracle.pic.apigw.state.redis.RedisConnectionHandler.getCommands(RedisConnectionHandler.java:202) at app//com.oracle.pic.apigw.state.redis.RedisCredentialRotator.handleRotationForProcess(RedisCredentialRotator.java:87) at app//com.oracle.pic.apigw.state.redis.RedisCredentialRotator.lambda$handleRotation$1(RedisCredentialRotator.java:71) at app//com.oracle.pic.apigw.state.redis.RedisCredentialRotator$$Lambda$1391/0x00000008403c9840.call(Unknown Source) at app//com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) at app//com.github.rholder.retry.Retryer.call(Retryer.java:160) at app//com.oracle.pic.apigw.state.redis.RedisCredentialRotator.handleRotation(RedisCredentialRotator.java:71) at app//com.oracle.pic.apigw.state.redis.RedisCredentialRotator.lambda$start$0(RedisCredentialRotator.java:45) at app//com.oracle.pic.apigw.state.redis.RedisCredentialRotator$$Lambda$1389/0x00000008403ca040.run(Unknown Source) at java.base@11.0.17.0.2/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base@11.0.17.0.2/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at java.base@11.0.17.0.2/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) at java.base@11.0.17.0.2/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base@11.0.17.0.2/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base@11.0.17.0.2/java.lang.Thread.run(Thread.java:834) ```

Input Code

Setting up the lettuce cluster client.
redisURI.setPassword(passwordSupplier.get());
            clusterClient = RedisClusterClient.create(redisURI);
            clusterClient.setOptions(ClusterClientOptions.builder()
                .autoReconnect(true)
                .sslOptions(sslOptions)
                .topologyRefreshOptions(ClusterTopologyRefreshOptions.builder()
                    .enablePeriodicRefresh(false)
                    .dynamicRefreshSources(false)
                    .build())
                .validateClusterNodeMembership(false)
                .build());
Attempting to create cluster connection.
      if (Objects.isNull(value)) {
                log.info("Creating new cluster connection for key '{}'", key);
                value = this.getClusterClient().connect();
            }

Expected behavior/code

Would expect either connection to succeed or otherwise timeout if there is an issue. Looks like it remains stuck waiting for the future to complete.

We are not setting any timeoutOptions so assume this will pick up the default ones set or do we need to explicitly set a timeout option (e.g..timeoutOptions(TimeoutOptions.enabled())

Environment

Possible Solution

Additional context

tishun commented 3 months ago

Quite possibly the same as the issue described in #2905

Roiocam commented 1 month ago

can you provider the stack trace that what happend in thread lettuce-xxEventLoop?