Open coney opened 3 years ago
Thanks for reporting the issue. Connection validation is synchronous while reactive connections are using non-blocking API. What happens here is that the I/O thread is blocked and cannot proceed with connection validation or creation. Connection validation isn't necessary for Lettuce as Lettuce auto-reconnects disconnected connections. If a connection is truly broken then either because the Redis server is down or due to a network partition. Both scenarios cannot be recovered from the client-side.
Therefore, please disable validateConnection
and make sure to enable early connection initialization to prevent blocking of the event loop thread.
Bug Report
LettuceConnectionFactory.SharedConnection#resetConnection
hangs forever and cause deadlockCurrent Behavior
I have enabled validateConnection for Lettuce connection factory, and occasionally my service can't serve any incoming request. The thread dump shows that all the http threads are waiting for the connection
Stack trace
``` // http threads, take one for example "reactor-http-epoll-6" #126 daemon prio=5 os_prio=0 cpu=16164.68ms elapsed=26788.53s allocated=1510M defined_classes=693 tid=0x0000560e1cfc1000 nid=0x168b waiting for monitor entry [0x00007fdb977c2000] java.lang.Thread.State: BLOCKED (on object monitor) at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getConnection(LettuceConnectionFactory.java:1295) - waiting to lock <0x000000070a63d728> (a java.lang.Object) at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getSharedReactiveConnection(LettuceConnectionFactory.java:1049) at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveClusterConnection(LettuceConnectionFactory.java:481) at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveConnection(LettuceConnectionFactory.java:457) at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveConnection(LettuceConnectionFactory.java:101) at org.springframework.data.redis.core.ReactiveRedisTemplate.lambda$doInConnection$0(ReactiveRedisTemplate.java:198) at org.springframework.data.redis.core.ReactiveRedisTemplate$$Lambda$773/0x00000008007edc40.get(Unknown Source) at reactor.core.publisher.MonoSupplier.call(MonoSupplier.java:85) at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:224) at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onComplete(MonoIgnoreThen.java:203) ``` And all https threads are waiting for a lock which hold by the thread as below: ``` "lettuce-epollEventLoop-5-1" #31 daemon prio=5 os_prio=0 cpu=7049.44ms elapsed=26823.40s allocated=1441M defined_classes=171 tid=0x0000560e1dd67000 nid=0x13de waiting on condition [0x00007fdbb8753000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.8/Native Method) - parking to wait for <0x00000007197dec70> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(java.base@11.0.8/Unknown Source) at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.8/Unknown Source) at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.8/Unknown Source) at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.8/Unknown Source) at java.util.concurrent.CompletableFuture.join(java.base@11.0.8/Unknown Source) at org.springframework.data.redis.connection.lettuce.LettuceFutureUtils.join(LettuceFutureUtils.java:68) at org.springframework.data.redis.connection.lettuce.LettuceConnectionProvider.release(LettuceConnectionProvider.java:74) at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.release(LettuceConnectionFactory.java:1596) at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.resetConnection(LettuceConnectionFactory.java:1360) - locked <0x000000070a63d728> (a java.lang.Object) at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.validateConnection(LettuceConnectionFactory.java:1346) - locked <0x000000070a63d728> (a java.lang.Object) at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getConnection(LettuceConnectionFactory.java:1302) - locked <0x000000070a63d728> (a java.lang.Object) at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getSharedReactiveConnection(LettuceConnectionFactory.java:1049) at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveClusterConnection(LettuceConnectionFactory.java:481) at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveConnection(LettuceConnectionFactory.java:457) at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveConnection(LettuceConnectionFactory.java:101) at org.springframework.data.redis.core.ReactiveRedisTemplate.lambda$doInConnection$0(ReactiveRedisTemplate.java:198) at org.springframework.data.redis.core.ReactiveRedisTemplate$$Lambda$773/0x00000008007edc40.get(Unknown Source) ```Input Code
Input Code
Our application is using webflux to handle API request's, but I found that lettuce using `synchronized` to protect getConnection: ``` java // org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.SharedConnection#getConnection @Nullable StatefulConnectionExpected behavior/code
reset connection could be over in time and no deadlock.
Environment
redis relevant configuration:
Possible Solution
In
org.springframework.data.redis.connection.lettuce.LettuceConnectionProvider#release
, seems that it will wait for future forever, maybe a timeout could partially avoid this situation? Still don't know why release hangs.Additional context
stacktrace.zip
Reference
The original issue was posted on https://github.com/lettuce-io/lettuce-core/issues/1861