qos-ch / logback

The reliable, generic, fast and flexible logging framework for Java.
http://logback.qos.ch
Other
2.98k stars 1.28k forks source link

AsyncAppender hangs intermittently #776

Open long-forgotten opened 7 months ago

long-forgotten commented 7 months ago

Hello team,

Recently we leverage AsyncAppender to reduce latency as logging. It does help to reduce the latency in case of large amounts of logging events. However, we spot a hang issue. Though, for now, it occurs only once, we're afraid it's a bug of logback or JDK and it might happen again.

Below is the thread stack, almost all threads of our process are blocked on this lock and we fail to locate the thread owning this lock(it's weird that jstack doesn't show which thread holds that lock, we guess the thread holding the lock might crash for somehow reason, but we have no way to prove that)

"msgWorkTP-582450565-6-thread-1" #642 prio=5 os_prio=0 tid=0x00007efec4094000 nid=0x4ca7 waiting on condition [0x00007efe841f9000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park0(Native Method)
    - parking to wait for  <0x00000007463ab208> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
    at sun.misc.Unsafe.park(Unsafe.java:1038)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:176)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:842)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:876)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1207)
    at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
    at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
    at java.util.concurrent.ArrayBlockingQueue.remainingCapacity(ArrayBlockingQueue.java:468)
    at ch.qos.logback.core.AsyncAppenderBase.isQueueBelowDiscardingThreshold(AsyncAppenderBase.java:170)
    at ch.qos.logback.core.AsyncAppenderBase.append(AsyncAppenderBase.java:162)
    at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:85)
    at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
    at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:272)
    at ch.qos.logback.classic.Logger.callAppenders(Logger.java:259)
    at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:426)
    at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:386)
    at ch.qos.logback.classic.Logger.info(Logger.java:584)

Our configuration is below

<!-- neverBlock true means possible loss of logging events in case of full queue -->
<appender name="ASYNC_MODULE_APPENDER" class="ch.qos.logback.classic.AsyncAppender">
    <appender-ref ref="MODULE_APPENDER" />
    <queueSize>1024</queueSize>
    <discardingThreshold>0</discardingThreshold>
    <neverBlock>true</neverBlock>
</appender>

And module configuration of MODULE_APPENDER is below

<appender name="MODULE_APPENDER" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <file>${user.home}/logs/sandbox/platform-test-support-sandbox-module.log</file>
    <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
        <FileNamePattern>${user.home}/logs/sandbox/platform-test-support-sandbox-module.log.%d{yyyy-MM-dd}.%i</FileNamePattern>
        <maxFileSize>100MB</maxFileSize>
        <MaxHistory>5</MaxHistory>
        <totalSizeCap>500MB</totalSizeCap>
    </rollingPolicy>
    <encoder>
        <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %msg%n</pattern>
        <charset>UTF-8</charset>
    </encoder>
    <filter class="ch.qos.logback.classic.filter.ThresholdFilter">
        <level>INFO</level>
    </filter>
</appender>

The original issue is https://jira.qos.ch/browse/LOGBACK-1741 since Jira services is migrated here, so I move this issue here as well since we're still encountering this problem intermittently

kangmingchen commented 2 months ago

you can reset neverblock=true,and try again