redis / lettuce

Advanced Java Redis client for thread-safe sync, async, and reactive usage. Supports Cluster, Sentinel, Pipelining, and codecs.
https://lettuce.io
MIT License
5.36k stars 960 forks source link

AbstractRedisClient#shutdown() never ends sometimes #2278

Closed ikeyat closed 3 months ago

ikeyat commented 1 year ago

Bug Report

Our application accesses Redis Cluster with Lettuce 6.1.5.RELEASE. AbstractRedisClient#shutdown() never ends sometimes, so the threads are never released.

It looks similar to #1768 that I reported last year, but the stack trace is a little different.

Current Behavior

The following is a thread dump of calling shutdown.
``` { "threadName": "http-nio-8080-exec-3", "threadId": 39, "blockedTime": -1, "blockedCount": 1055, "waitedTime": -1, "waitedCount": 15738, "lockName": "java.util.concurrent.CompletableFuture$Signaller@5ffdf4c", "lockOwnerId": -1, "lockOwnerName": null, "daemon": true, "inNative": false, "suspended": false, "threadState": "WAITING", "priority": 5, "stackTrace": [ { "classLoaderName": null, "moduleName": "java.base", "moduleVersion": "11.0.16.1", "methodName": "park", "fileName": "Unsafe.java", "lineNumber": -2, "className": "jdk.internal.misc.Unsafe", "nativeMethod": true }, { "classLoaderName": null, "moduleName": "java.base", "moduleVersion": "11.0.16.1", "methodName": "park", "fileName": "LockSupport.java", "lineNumber": 194, "className": "java.util.concurrent.locks.LockSupport", "nativeMethod": false }, { "classLoaderName": null, "moduleName": "java.base", "moduleVersion": "11.0.16.1", "methodName": "block", "fileName": "CompletableFuture.java", "lineNumber": 1796, "className": "java.util.concurrent.CompletableFuture$Signaller", "nativeMethod": false }, { "classLoaderName": null, "moduleName": "java.base", "moduleVersion": "11.0.16.1", "methodName": "managedBlock", "fileName": "ForkJoinPool.java", "lineNumber": 3128, "className": "java.util.concurrent.ForkJoinPool", "nativeMethod": false }, { "classLoaderName": null, "moduleName": "java.base", "moduleVersion": "11.0.16.1", "methodName": "waitingGet", "fileName": "CompletableFuture.java", "lineNumber": 1823, "className": "java.util.concurrent.CompletableFuture", "nativeMethod": false }, { "classLoaderName": null, "moduleName": "java.base", "moduleVersion": "11.0.16.1", "methodName": "get", "fileName": "CompletableFuture.java", "lineNumber": 1998, "className": "java.util.concurrent.CompletableFuture", "nativeMethod": false }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "shutdown", "fileName": "AbstractRedisClient.java", "lineNumber": 504, "className": "io.lettuce.core.AbstractRedisClient", "nativeMethod": false }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "shutdown", "fileName": "AbstractRedisClient.java", "lineNumber": 472, "className": "io.lettuce.core.AbstractRedisClient", "nativeMethod": false }, ....omitted ```
Lettuce threads.
``` { "threadName": "lettuce-epollEventLoop-4-1", "threadId": 52, "blockedTime": -1, "blockedCount": 0, "waitedTime": -1, "waitedCount": 1, "lockName": null, "lockOwnerId": -1, "lockOwnerName": null, "daemon": true, "inNative": true, "suspended": false, "threadState": "RUNNABLE", "priority": 5, "stackTrace": [ { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "epollWait", "fileName": "Native.java", "lineNumber": -2, "className": "io.netty.channel.epoll.Native", "nativeMethod": true }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "epollWait", "fileName": "Native.java", "lineNumber": 209, "className": "io.netty.channel.epoll.Native", "nativeMethod": false }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "epollWait", "fileName": "Native.java", "lineNumber": 202, "className": "io.netty.channel.epoll.Native", "nativeMethod": false }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "epollWaitNoTimerChange", "fileName": "EpollEventLoop.java", "lineNumber": 294, "className": "io.netty.channel.epoll.EpollEventLoop", "nativeMethod": false }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "run", "fileName": "EpollEventLoop.java", "lineNumber": 351, "className": "io.netty.channel.epoll.EpollEventLoop", "nativeMethod": false }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "run", "fileName": "SingleThreadEventExecutor.java", "lineNumber": 997, "className": "io.netty.util.concurrent.SingleThreadEventExecutor$4", "nativeMethod": false }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "run", "fileName": "ThreadExecutorMap.java", "lineNumber": 74, "className": "io.netty.util.internal.ThreadExecutorMap$2", "nativeMethod": false }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "run", "fileName": "FastThreadLocalRunnable.java", "lineNumber": 30, "className": "io.netty.util.concurrent.FastThreadLocalRunnable", "nativeMethod": false }, { "classLoaderName": null, "moduleName": "java.base", "moduleVersion": "11.0.16.1", "methodName": "run", "fileName": "Thread.java", "lineNumber": 829, "className": "java.lang.Thread", "nativeMethod": false } ], "lockedMonitors": [], "lockedSynchronizers": [], "lockInfo": null }, { "threadName": "lettuce-eventExecutorLoop-1-1", "threadId": 54, "blockedTime": -1, "blockedCount": 0, "waitedTime": -1, "waitedCount": 13203, "lockName": "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@11f0c6f4", "lockOwnerId": -1, "lockOwnerName": null, "daemon": true, "inNative": false, "suspended": false, "threadState": "WAITING", "priority": 5, "stackTrace": [ { "classLoaderName": null, "moduleName": "java.base", "moduleVersion": "11.0.16.1", "methodName": "park", "fileName": "Unsafe.java", "lineNumber": -2, "className": "jdk.internal.misc.Unsafe", "nativeMethod": true }, { "classLoaderName": null, "moduleName": "java.base", "moduleVersion": "11.0.16.1", "methodName": "park", "fileName": "LockSupport.java", "lineNumber": 194, "className": "java.util.concurrent.locks.LockSupport", "nativeMethod": false }, { "classLoaderName": null, "moduleName": "java.base", "moduleVersion": "11.0.16.1", "methodName": "await", "fileName": "AbstractQueuedSynchronizer.java", "lineNumber": 2081, "className": "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject", "nativeMethod": false }, { "classLoaderName": null, "moduleName": "java.base", "moduleVersion": "11.0.16.1", "methodName": "take", "fileName": "LinkedBlockingQueue.java", "lineNumber": 433, "className": "java.util.concurrent.LinkedBlockingQueue", "nativeMethod": false }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "takeTask", "fileName": "SingleThreadEventExecutor.java", "lineNumber": 244, "className": "io.netty.util.concurrent.SingleThreadEventExecutor", "nativeMethod": false }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "run", "fileName": "DefaultEventExecutor.java", "lineNumber": 64, "className": "io.netty.util.concurrent.DefaultEventExecutor", "nativeMethod": false }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "run", "fileName": "SingleThreadEventExecutor.java", "lineNumber": 997, "className": "io.netty.util.concurrent.SingleThreadEventExecutor$4", "nativeMethod": false }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "run", "fileName": "ThreadExecutorMap.java", "lineNumber": 74, "className": "io.netty.util.internal.ThreadExecutorMap$2", "nativeMethod": false }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "run", "fileName": "FastThreadLocalRunnable.java", "lineNumber": 30, "className": "io.netty.util.concurrent.FastThreadLocalRunnable", "nativeMethod": false }, { "classLoaderName": null, "moduleName": "java.base", "moduleVersion": "11.0.16.1", "methodName": "run", "fileName": "Thread.java", "lineNumber": 829, "className": "java.lang.Thread", "nativeMethod": false } ], "lockedMonitors": [], "lockedSynchronizers": [], "lockInfo": { "className": "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject", "identityHashCode": 300992244 } }, { "threadName": "lettuce-eventExecutorLoop-1-2", "threadId": 55, "blockedTime": -1, "blockedCount": 0, "waitedTime": -1, "waitedCount": 13190, "lockName": "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@46b0062b", "lockOwnerId": -1, "lockOwnerName": null, "daemon": true, "inNative": false, "suspended": false, "threadState": "WAITING", "priority": 5, "stackTrace": [ { "classLoaderName": null, "moduleName": "java.base", "moduleVersion": "11.0.16.1", "methodName": "park", "fileName": "Unsafe.java", "lineNumber": -2, "className": "jdk.internal.misc.Unsafe", "nativeMethod": true }, { "classLoaderName": null, "moduleName": "java.base", "moduleVersion": "11.0.16.1", "methodName": "park", "fileName": "LockSupport.java", "lineNumber": 194, "className": "java.util.concurrent.locks.LockSupport", "nativeMethod": false }, { "classLoaderName": null, "moduleName": "java.base", "moduleVersion": "11.0.16.1", "methodName": "await", "fileName": "AbstractQueuedSynchronizer.java", "lineNumber": 2081, "className": "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject", "nativeMethod": false }, { "classLoaderName": null, "moduleName": "java.base", "moduleVersion": "11.0.16.1", "methodName": "take", "fileName": "LinkedBlockingQueue.java", "lineNumber": 433, "className": "java.util.concurrent.LinkedBlockingQueue", "nativeMethod": false }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "takeTask", "fileName": "SingleThreadEventExecutor.java", "lineNumber": 244, "className": "io.netty.util.concurrent.SingleThreadEventExecutor", "nativeMethod": false }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "run", "fileName": "DefaultEventExecutor.java", "lineNumber": 64, "className": "io.netty.util.concurrent.DefaultEventExecutor", "nativeMethod": false }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "run", "fileName": "SingleThreadEventExecutor.java", "lineNumber": 997, "className": "io.netty.util.concurrent.SingleThreadEventExecutor$4", "nativeMethod": false }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "run", "fileName": "ThreadExecutorMap.java", "lineNumber": 74, "className": "io.netty.util.internal.ThreadExecutorMap$2", "nativeMethod": false }, { "classLoaderName": null, "moduleName": null, "moduleVersion": null, "methodName": "run", "fileName": "FastThreadLocalRunnable.java", "lineNumber": 30, "className": "io.netty.util.concurrent.FastThreadLocalRunnable", "nativeMethod": false }, { "classLoaderName": null, "moduleName": "java.base", "moduleVersion": "11.0.16.1", "methodName": "run", "fileName": "Thread.java", "lineNumber": 829, "className": "java.lang.Thread", "nativeMethod": false } ], "lockedMonitors": [], "lockedSynchronizers": [], "lockInfo": { "className": "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject", "identityHashCode": 1185941035 } }, ```

Environment

mp911de commented 1 year ago

The threads do not indicate a deadlock so I'm not able to tell from these details what's going on.

ikeyat commented 1 year ago

Is there any other input for this investigation?

I will replace shutdown() with shutdownAsync() as a workaround, if I cannot find any solution...

yangbodong22011 commented 1 year ago

From https://github.com/lettuce-io/lettuce-core/issues/1768, I have the same problem, Lettuce version: 6.2.2.RELEASE, the jstack show:

"pool-1-thread-5" #33 prio=5 os_prio=0 cpu=2066.98ms elapsed=683933.65s tid=0x0000ffff896c2800 nid=237244 waiting on condition  [0x0000fffefc3fd000]
   java.lang.Thread.State: WAITING (parking)
    at jdk.internal.misc.Unsafe.park0(java.base@11.0.18.17-AJDK/Native Method)
    - parking to wait for  <0x0000000086b18400> (a java.util.concurrent.CompletableFuture$Signaller)
    at jdk.internal.misc.Unsafe.park(java.base@11.0.18.17-AJDK/Unsafe.java:2339)
    at java.util.concurrent.locks.LockSupport.park(java.base@11.0.18.17-AJDK/LockSupport.java:194)
    at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.18.17-AJDK/CompletableFuture.java:1796)
    at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.18.17-AJDK/ForkJoinPool.java:3128)
    at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.18.17-AJDK/CompletableFuture.java:1823)
    at java.util.concurrent.CompletableFuture.get(java.base@11.0.18.17-AJDK/CompletableFuture.java:1998)
    at io.lettuce.core.AbstractRedisClient.shutdown(AbstractRedisClient.java:521)
    at io.lettuce.core.AbstractRedisClient.shutdown(AbstractRedisClient.java:484)
    at io.lettuce.core.AbstractRedisClient.close(AbstractRedisClient.java:489)
    at base.LettuceCommandTestBase.tearDown(LettuceCommandTestBase.java:47)
    at jdk.internal.reflect.GeneratedMethodAccessor112.invoke(Unknown Source)
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.18.17-AJDK/DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(java.base@11.0.18.17-AJDK/Method.java:566)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
    at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:292)
    at org.junit.runners.Suite.runChild(Suite.java:128)
    at org.junit.runners.Suite.runChild(Suite.java:24)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:292)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:136)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:127)
    at org.junit.runner.JUnitCore.runClasses(JUnitCore.java:76)
    at Main$1.run(Main.java:104)
    at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.18.17-AJDK/Executors.java:515)
    at java.util.concurrent.FutureTask.run(java.base@11.0.18.17-AJDK/FutureTask.java:264)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.18.17-AJDK/ScheduledThreadPoolExecutor.java:304)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.18.17-AJDK/ThreadPoolExecutor.java:1128)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.18.17-AJDK/ThreadPoolExecutor.java:628)
    at java.lang.Thread.run(java.base@11.0.18.17-AJDK/Thread.java:991)
mp911de commented 1 year ago

Please include the full stack trace. There must be other threads involved blocking something on what the shutdown synchronizes.

yangbodong22011 commented 1 year ago

@mp911de full stack trace: jstack.log

mp911de commented 1 year ago

Interesting, there are no live netty threads anymore. Did you observe any error log output?

yangbodong22011 commented 1 year ago

Interesting, there are no live netty threads anymore. Did you observe any error log output?

There are no other errors, the cpu is normal, the thread status in my thread pool is WAITING.

Unniboy commented 1 year ago

using 6.1.8.RELEASE and hitting the similar trace as @yangbodong2201.(using vertx-core 3.9.4 - not sure if this matters)

Application has hundreds of Vertx worker_pool threads and this issue is seen intermittently. The application code is creating, connecting and closing separate RedisClient for each of the event instead of reusing the object throughout jvm lifetime. We are updating the application to reuse the object that would most likely help us not hitting this.

I find this interesting because other worker_pool threads go through the same creating, connecting and closing RedisClient on user events but the shutdown process goes to completion without the stack trace.

WARNING: Thread Thread[worker_pool-203,5,main]=Thread[worker_pool-203,5,main] has been blocked for 87372 ms, time limit is 60000 msio.vertx.core.VertxException: Thread blocked
    at java.base@13.0.2/jdk.internal.misc.Unsafe.park(Native Method)
    at java.base@13.0.2/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
    at java.base@13.0.2/java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1867)
    at java.base@13.0.2/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3137)
    at java.base@13.0.2/java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1894)
    at java.base@13.0.2/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2069)
    at app//io.lettuce.core.AbstractRedisClient.shutdown(AbstractRedisClient.java:504)
    at app//io.lettuce.core.AbstractRedisClient.shutdown(AbstractRedisClient.java:472)
    at app//...application_code_traces...
    at app//io.vertx.core.eventbus.impl.HandlerRegistration.deliver(HandlerRegistration.java:278)
    at app//io.vertx.core.eventbus.impl.HandlerRegistration.handle(HandlerRegistration.java:264)
    at app//io.vertx.core.eventbus.impl.EventBusImpl$InboundDeliveryContext.next(EventBusImpl.java:567)
    at app//io.vertx.core.eventbus.impl.EventBusImpl.lambda$deliverToHandler$4(EventBusImpl.java:527)
    at app//io.vertx.core.eventbus.impl.EventBusImpl$$Lambda$1236/0x0000000800f8c040.handle(Unknown Source)
    at app//io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:366)
    at app//io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
    at app//io.vertx.core.impl.WorkerContext$$Lambda$155/0x0000000800d0c040.run(Unknown Source)
    at app//io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
    at app//io.vertx.core.impl.TaskQueue$$Lambda$137/0x0000000800cea040.run(Unknown Source)
    at java.base@13.0.2/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base@13.0.2/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at app//io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base@13.0.2/java.lang.Thread.run(Thread.java:830)
seabamirum commented 9 months ago

This is happening to me as well. The thread dump has the shutdownhook in a WAITING state, with several lettuce threads afterwards in a TIMED_WAITING state.

"SpringApplicationShutdownHook" #23 [8271] prio=5 os_prio=0 cpu=35.43ms elapsed=11.42s tid=0x00007f7574167590 nid=8271 waiting on condition  [0x00007f7526dfc000]
   java.lang.Thread.State: WAITING (parking)
    at jdk.internal.misc.Unsafe.park(java.base@19.0.2/Native Method)
    - parking to wait for  <0x00000000892083c0> (a java.util.concurrent.CompletableFuture$Signaller)
    at java.util.concurrent.locks.LockSupport.park(java.base@19.0.2/LockSupport.java:221)
    at java.util.concurrent.CompletableFuture$Signaller.block(java.base@19.0.2/CompletableFuture.java:1864)
    at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@19.0.2/ForkJoinPool.java:3745)
    at java.util.concurrent.ForkJoinPool.managedBlock(java.base@19.0.2/ForkJoinPool.java:3690)
    at java.util.concurrent.CompletableFuture.waitingGet(java.base@19.0.2/CompletableFuture.java:1898)
    at java.util.concurrent.CompletableFuture.get(java.base@19.0.2/CompletableFuture.java:2072)
    at io.lettuce.core.AbstractRedisClient.shutdown(AbstractRedisClient.java:529)
    at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.stop(LettuceConnectionFactory.java:922)
    at org.springframework.context.SmartLifecycle.stop(SmartLifecycle.java:117)

"lettuce-epollEventLoop-16-2" #57 [8272] daemon prio=5 os_prio=0 cpu=37.17ms elapsed=11.39s tid=0x00007f757c06ed50 nid=8272 waiting on condition  [0x00007f7527ffe000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep0(java.base@19.0.2/Native Method)

...
yangbodong22011 commented 9 months ago

FYI: eventually I used r.shutdownAsync() instead of r.close() and the problem went away.