Closed sstamm closed 1 year ago
Hi @sstamm, thanks for reporting this!
It would be helpful if you can provide trace log for that as described in https://loki4j.github.io/loki-logback-appender/docs/tracing#enabling-tracing-mode (you would need to upgrade to v1.4.0)
As a workaround, could you please try to set drainOnStop
to false
, e.g.:
<appender name="LOKI" class="com.github.loki4j.logback.Loki4jAppender">
<drainOnStop>false</drainOnStop>
...
</appender>
Is it possible to to this with logback? Like:
<logger name="AsyncBufferPipeline" level="TRACE" additivity="false">
Happens randomly at distributed workers, have to specify then a target per worker.
This should be addressed in v1.4.1. Feel free to re-open in case the issue still remains.
Still exists with 1.4.1:
hread dump at 0:03.689.780
* Thread group "main":
Thread "HttpClient-1-SelectorManager":
at sun.nio.ch.WEPoll.wait(long, long, int, int)
at sun.nio.ch.WEPollSelectorImpl.doSelect(java.util.function.Consumer, long)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.util.function.Consumer, long)
at sun.nio.ch.SelectorImpl.select(long)
at jdk.internal.net.http.HttpClientImpl$SelectorManager.run()
Thread "Logback shutdown hook [default]":
at jdk.internal.misc.Unsafe.park(boolean, long)
at java.util.concurrent.locks.LockSupport.parkNanos(long)
at com.github.loki4j.client.pipeline.AsyncBufferPipeline.waitSendQueueLessThan(int, long) (line: 370)
at com.github.loki4j.client.pipeline.AsyncBufferPipeline.waitSendQueueIsEmpty(long) (line: 164)
at com.github.loki4j.client.pipeline.AsyncBufferPipeline.stop() (line: 144)
at com.github.loki4j.logback.Loki4jAppender.stop() (line: 167)
at ch.qos.logback.core.spi.AppenderAttachableImpl.detachAndStopAllAppenders() (line: 107)
at ch.qos.logback.classic.Logger.detachAndStopAllAppenders() (line: 209)
at ch.qos.logback.classic.Logger.recursiveReset() (line: 333)
at ch.qos.logback.classic.LoggerContext.reset() (line: 362)
at ch.qos.logback.classic.LoggerContext.stop() (line: 342)
at ch.qos.logback.core.hook.ShutdownHookBase.stop() (line: 40)
at ch.qos.logback.core.hook.DefaultShutdownHook.run() (line: 62)
at java.lang.Thread.run()
Thread "loki4j-encoder-0":
at jdk.internal.misc.Unsafe.park(boolean, long)
at java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long)
at com.github.loki4j.client.pipeline.AsyncBufferPipeline.encodeStep(com.github.loki4j.client.batch.LogRecordBatch) (line: 214)
at com.github.loki4j.client.pipeline.AsyncBufferPipeline.runEncodeLoop() (line: 195)
at com.github.loki4j.client.pipeline.AsyncBufferPipeline.lambda$start$4() (line: 122)
at com.github.loki4j.client.pipeline.AsyncBufferPipeline$$Lambda$205.0x0000000800d451d8.run()
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
at java.util.concurrent.ThreadPoolExecutor$Worker.run()
at java.lang.Thread.run()
Thread "loki4j-scheduler-0":
at jdk.internal.misc.Unsafe.park(boolean, long)
at java.util.concurrent.locks.LockSupport.park()
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block()
at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.util.concurrent.ForkJoinPool$ManagedBlocker)
at java.util.concurrent.ForkJoinPool.managedBlock(java.util.concurrent.ForkJoinPool$ManagedBlocker)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await()
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take()
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take()
at java.util.concurrent.ThreadPoolExecutor.getTask()
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
at java.util.concurrent.ThreadPoolExecutor$Worker.run()
at java.lang.Thread.run()
Thread "loki4j-sender-0":
at jdk.internal.misc.Unsafe.park(boolean, long)
at java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long)
at com.github.loki4j.client.pipeline.AsyncBufferPipeline.sendStep() (line: 269)
at com.github.loki4j.client.pipeline.AsyncBufferPipeline.runSendLoop() (line: 205)
at com.github.loki4j.client.pipeline.AsyncBufferPipeline.lambda$start$3() (line: 119)
at com.github.loki4j.client.pipeline.AsyncBufferPipeline$$Lambda$204.0x0000000800d44fb8.run()
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
at java.util.concurrent.ThreadPoolExecutor$Worker.run()
at java.lang.Thread.run()
@sstamm Ok, let's continue digging into this.
drainOnStop
to false
as I suggested earlier?drainOnStop
to false
helps, could you please enable metrics (https://loki4j.github.io/loki-logback-appender/docs/performance), set drainOnStop
to true
and see if loki4j.unsent.events
metric is 0 for a hanging process?drainOnStop
to false
does not help, could you please elaborate more on your Linux vs Windows usage? You run the app on Linux as frequently as on Windows and see no issue on Linux, correct?Ready to re-open this issue again if any details on how to reproduce it are provided.
Randomly, the JVM does not terminate with the flowing StackTrace. Using 1.4.0-rc2. And only seeing the problem on Windows Server 2012R2, same App on Linux does not show the problem. Also just removing the Loki Appender (but still using Logback for RollingFileAppender) did not cause any hang.