Closed KP-EV closed 8 months ago
Please attach a thread dump for further diagnosis.
the offending thread has this trace:
"lettuce-nioEventLoop-4-2" #80 [146] daemon prio=5 os_prio=0 cpu=17071.84ms elapsed=63349.49s tid=0x000078f8b8051170 nid=146 waiting on condition [0x000078f9b41fd000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@21.0.2/Native Method)
- parking to wait for <0x00000000d09834f8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(java.base@21.0.2/LockSupport.java:221)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@21.0.2/AbstractQueuedSynchronizer.java:754)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@21.0.2/AbstractQueuedSynchronizer.java:990)
at java.util.concurrent.locks.ReentrantLock$Sync.lock(java.base@21.0.2/ReentrantLock.java:153)
at java.util.concurrent.locks.ReentrantLock.lock(java.base@21.0.2/ReentrantLock.java:322)
at java.lang.ref.ReferenceQueue.poll(java.base@21.0.2/ReferenceQueue.java:182)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.AbstractWeakConcurrentMap.expungeStaleEntries(AbstractWeakConcurrentMap.java:235)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.WeakConcurrentMap$WithInlinedExpunction.getIfPresent(WeakConcurrentMap.java:193)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.WeakLockFreeCache.get(WeakLockFreeCache.java:26)
at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$lang$Runnable$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.mapGet(VirtualFieldImplementationsGenerator.java:295)
at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$lang$Runnable$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.realGet(VirtualFieldImplementationsGenerator.java)
at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$lang$Runnable$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.get(VirtualFieldImplementationsGenerator.java:277)
at io.opentelemetry.javaagent.bootstrap.executors.ExecutorAdviceHelper.attachContextToTask(ExecutorAdviceHelper.java:51)
at java.util.concurrent.ForkJoinPool.execute(java.base@21.0.2/ForkJoinPool.java:2847)
at java.lang.VirtualThread.submitRunContinuation(java.base@21.0.2/VirtualThread.java:240)
at java.lang.VirtualThread.unpark(java.base@21.0.2/VirtualThread.java:734)
at java.lang.System$2.unparkVirtualThread(java.base@21.0.2/System.java:2660)
at jdk.internal.misc.VirtualThreads.unpark(java.base@21.0.2/VirtualThreads.java:93)
at java.util.concurrent.locks.LockSupport.unpark(java.base@21.0.2/LockSupport.java:179)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.signalNext(java.base@21.0.2/AbstractQueuedSynchronizer.java:645)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(java.base@21.0.2/AbstractQueuedSynchronizer.java:1060)
at java.util.concurrent.locks.ReentrantLock.unlock(java.base@21.0.2/ReentrantLock.java:494)
at java.lang.ref.ReferenceQueue.poll(java.base@21.0.2/ReferenceQueue.java:186)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.AbstractWeakConcurrentMap.expungeStaleEntries(AbstractWeakConcurrentMap.java:235)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.WeakConcurrentMap$WithInlinedExpunction.getIfPresent(WeakConcurrentMap.java:193)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.WeakLockFreeCache.get(WeakLockFreeCache.java:26)
at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$lang$Runnable$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.mapGet(VirtualFieldImplementationsGenerator.java:295)
at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$lang$Runnable$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.realGet(VirtualFieldImplementationsGenerator.java)
at io.opentelemetry.javaagent.bootstrap.field.VirtualFieldImpl$java$lang$Runnable$io$opentelemetry$javaagent$bootstrap$executors$PropagatedContext.get(VirtualFieldImplementationsGenerator.java:277)
at io.opentelemetry.javaagent.bootstrap.executors.ExecutorAdviceHelper.attachContextToTask(ExecutorAdviceHelper.java:51)
at java.util.concurrent.ForkJoinPool.execute(java.base@21.0.2/ForkJoinPool.java:2847)
at java.lang.VirtualThread.submitRunContinuation(java.base@21.0.2/VirtualThread.java:240)
at java.lang.VirtualThread.unpark(java.base@21.0.2/VirtualThread.java:734)
at java.lang.System$2.unparkVirtualThread(java.base@21.0.2/System.java:2660)
at jdk.internal.misc.VirtualThreads.unpark(java.base@21.0.2/VirtualThreads.java:93)
at java.util.concurrent.locks.LockSupport.unpark(java.base@21.0.2/LockSupport.java:179)
at java.util.concurrent.CompletableFuture$Signaller.tryFire(java.base@21.0.2/CompletableFuture.java:1848)
at java.util.concurrent.CompletableFuture.postComplete(java.base@21.0.2/CompletableFuture.java:510)
at java.util.concurrent.CompletableFuture.complete(java.base@21.0.2/CompletableFuture.java:2179)
at io.lettuce.core.protocol.AsyncCommand.completeResult(AsyncCommand.java:122)
at io.lettuce.core.protocol.AsyncCommand.complete(AsyncCommand.java:111)
at io.lettuce.core.protocol.CommandWrapper.complete(CommandWrapper.java:63)
at io.lettuce.core.protocol.CommandWrapper.complete(CommandWrapper.java:63)
at io.lettuce.core.protocol.CommandHandler.complete(CommandHandler.java:745)
at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:680)
at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:597)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.runWith(java.base@21.0.2/Thread.java:1596)
at java.lang.Thread.run(java.base@21.0.2/Thread.java:1583)
Some open telemetry instrumentation blocks the event loop thread so it is out of scope of the driver and out of scope of Spring Data.
Generally, there's some proper tracing support via Micrometer Tracing that doesn't require instrumentation by a Java agent.
thanks for your help, ill take a look
FYI: disabling the redis implementation of the otel client via OTEL_INSTRUMENTATION_LETTUCE_ENABLED: 'false' worked fine. We didnt had this issue anymore. Thank you for your help :)
Hi,
we use spring-boot-starter-data-redis:3.2.2 with a single redis instance and configured these timeouts:
But we still notice that sometimes, after the application ran for a day, that the redis connection just completely dies and any call using @Cacheable annotation just hangs forever.
Looking into making the connection more reliable via this thread here: https://github.com/lettuce-io/lettuce-core/issues/1428
But iam also wondering why these three timeouts are not working and the connection still hangs forever. Any other call works fine and after a restart the application is working again.