kdgregory / log4j-aws-appenders

Appenders for Log4J 1.2.x, Log4J 2.x, and Logback that write to AWS destinations.
Apache License 2.0
67 stars 18 forks source link

Synchronized Attribute not Working #180

Closed RyanPoorman closed 1 year ago

RyanPoorman commented 1 year ago

Possibly an issue with my shaded jar, but when trying to run synchronized I get the following error:

ERROR StatusConsoleListener unable to append event
 com.kdgregory.logging.common.util.RetryManager2$TimeoutException: describe did not complete by 2023-05-02T22:23:05.576260Z (now 2023-05-02T22:23:05.954186Z)
        at com.kdgregory.logging.common.util.RetryManager2.invoke(RetryManager2.java:114)
        at com.kdgregory.logging.common.util.RetryManager2.invoke(RetryManager2.java:128)
        at com.kdgregory.logging.aws.cloudwatch.CloudWatchLogWriter.retrieveSequenceToken(CloudWatchLogWriter.java:278)
        at com.kdgregory.logging.aws.cloudwatch.CloudWatchLogWriter.lambda$sendBatch$0(CloudWatchLogWriter.java:141)
        at com.kdgregory.logging.common.util.RetryManager2.invoke(RetryManager2.java:99)
        at com.kdgregory.logging.common.util.RetryManager2.invoke(RetryManager2.java:128)
        at com.kdgregory.logging.aws.cloudwatch.CloudWatchLogWriter.sendBatch(CloudWatchLogWriter.java:137)
        at com.kdgregory.logging.aws.internal.AbstractLogWriter.processBatch(AbstractLogWriter.java:326)
        at com.kdgregory.logging.aws.internal.AbstractLogWriter.addMessage(AbstractLogWriter.java:228)
        at com.kdgregory.log4j2.aws.internal.AbstractAppender.internalAppend(AbstractAppender.java:362)
        at com.kdgregory.log4j2.aws.internal.AbstractAppender.append(AbstractAppender.java:219)
        at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:161)
        at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134)
        at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125)
        at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89)
        at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:683)
        at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:641)
        at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:624)
        at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:560)
        at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:82)
        at org.apache.logging.log4j.core.Logger.log(Logger.java:163)
        at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2168)
        at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2122)
        at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2105)
        at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:1980)
        at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1946)
        at org.apache.logging.log4j.spi.AbstractLogger.log(AbstractLogger.java:1615)
        at com.demo.App.main(App.java:18)

This does not happen when running async.

Below is a snippet of the log4j2.xml

<CloudWatchAppender name="ACCESS_CLOUDWATCH">
    <logGroup>/dummy/values/executable_jar/executed</logGroup>
    <logStream>${env:APPLICATION_NAME:-LOGS}-{date}-{hostname}-${awslogs:pid}</logStream>
    <PatternLayout pattern="%m%n" />
    <retentionPeriod>${env:CW_LOG_RETENTION_PERIOD:-7}</retentionPeriod>
    <synchronous>true</synchronous>
</CloudWatchAppender>
kdgregory commented 1 year ago

@RyanPoorman

The most likely reason for a timeout -- especially when it's retrieveSequenceToken() that times out -- is a network connectivity problem: running in a subnet that cannot connect to AWS.

Since you say this only happens when synchronous mode is enabled, the other option is a large number of concurrent processes starting up at the same time, which could exceed the AWS quota of 5 requests per second for this operation.

I will take a look over the weekend to see if I can reproduce using my example app, in case there is a bug hiding in that code.

However, I generally recommend against using synchronous mode, as significantly decreases logging performance.

kdgregory commented 1 year ago

@RyanPoorman

I tried running the example in synchronous mode this morning and had no problems.

Thinking about the initialization process, I don't think that it's a network connectivity issue (unless intermittent): to get to that point in the code you would need to retrieve and/or create the log group and log stream. So it seems like it's just taking a long time, more than the 60 second initialization timeout.

One possible fix for the symptom is to set the initializationTimeout to a larger value, such as 120000 (120 seconds). See the docs for more information.

However, that wouldn't affect the root cause. Can you provide more information about how you're running this? I'm assuming a Lambda, and wonder if (1) it has a low memory allocation, so gets a low CPU percentage, and (2) you're initializing the logging system inside the function handler. Although if you're running in a Lambda, why not use the built-in CloudWatch logging?

kdgregory commented 1 year ago

One thing that would be helpful is to see the entire debug log from the appender. Per the troubleshooting guide, set status="debug" in the logging config.

RyanPoorman commented 1 year ago

Thanks for getting back to me so quickly! I have attached the debug below and will provide some additional details as well:

DEBUG StatusConsoleListener null null initializing configuration XmlConfiguration[location=replaced_value/log4j2.xml] DEBUG StatusConsoleListener PluginManager 'Core' found 134 plugins DEBUG StatusConsoleListener PluginManager 'Level' found 0 plugins DEBUG StatusConsoleListener PluginManager 'Lookup' found 18 plugins DEBUG StatusConsoleListener Building Plugin[name=CustomLevel, class=org.apache.logging.log4j.core.config.CustomLevelConfig]. DEBUG StatusConsoleListener PluginManager 'TypeConverter' found 26 plugins DEBUG StatusConsoleListener createLevel(name="CUSTOM_LEVEL", intLevel="250") DEBUG StatusConsoleListener Creating CustomLevel(name='CUSTOM_LEVEL', intValue=250) DEBUG StatusConsoleListener Building Plugin[name=CustomLevels, class=org.apache.logging.log4j.core.config.CustomLevels]. DEBUG StatusConsoleListener createCustomLevels(={CustomLevel[name=CUSTOM_LEVEL, intLevel=250]}) DEBUG StatusConsoleListener Building Plugin[name=layout, class=org.apache.logging.log4j.core.layout.PatternLayout]. DEBUG StatusConsoleListener PatternLayout$Builder(pattern="%m%n", PatternSelector=null, Configuration(replaced_value/log4j2.xml), Replace=null, charset="null", alwaysWriteExceptions="null", disableAnsi="null", noConsoleNoAnsi="null", header="null", footer="null") DEBUG StatusConsoleListener PluginManager 'Converter' found 45 plugins DEBUG StatusConsoleListener Building Plugin[name=appender, class=com.kdgregory.log4j2.aws.CloudWatchAppender]. DEBUG StatusConsoleListener CloudWatchAppender$CloudWatchAppenderBuilder(name="CLOUDWATCH", logGroup="/debugging/appender", logStream="LOGS-{date}-{hostname}-43482", retentionPeriod="7", dedicatedWriter="null", PatternLayout(%m%n), Filter=null, Configuration(replaced_value/log4j2.xml), synchronous="true", batchDelay="null", truncateOversizeMessages="null", discardThreshold="null", discardAction="null", assumedRole="null", clientFactory="null", clientRegion="null", clientEndpoint="null", useShutdownHook="null", initializationTimeout="null", enableBatchLogging="null") DEBUG StatusConsoleListener Building Plugin[name=appenders, class=org.apache.logging.log4j.core.config.AppendersPlugin]. DEBUG StatusConsoleListener createAppenders(={com.kdgregory.log4j2.aws.CloudWatchAppender with name CloudWatchAppender}) DEBUG StatusConsoleListener Building Plugin[name=AppenderRef, class=org.apache.logging.log4j.core.config.AppenderRef]. DEBUG StatusConsoleListener createAppenderRef(ref="CLOUDWATCH", level="null", Filter=null) DEBUG StatusConsoleListener Building Plugin[name=logger, class=org.apache.logging.log4j.core.config.LoggerConfig]. DEBUG StatusConsoleListener LoggerConfig$Builder(additivity="false", level="CUSTOM_LEVEL", levelAndRefs="null", name="com.replaced.value.wrapper", includeLocation="null", ={CLOUDWATCH}, ={}, Configuration(replaced_value/log4j2.xml), Filter=null) DEBUG StatusConsoleListener Building Plugin[name=loggers, class=org.apache.logging.log4j.core.config.LoggersPlugin]. DEBUG StatusConsoleListener createLoggers(={com.replaced.value.wrapper}) WARN StatusConsoleListener No Root logger was configured, creating default ERROR-level Root logger with Console appender DEBUG StatusConsoleListener Starting OutputStreamManager SYSTEM_OUT.false.false-2 DEBUG StatusConsoleListener Configuration XmlConfiguration[location=replaced_value/log4j2.xml] initialized DEBUG StatusConsoleListener Starting configuration XmlConfiguration[location=replaced_value/log4j2.xml] DEBUG StatusConsoleListener Log4j2 ConfigurationScheduler starting 1 threads DEBUG StatusConsoleListener log writer starting (thread: main) DEBUG StatusConsoleListener checking for existence of CloudWatch log group: /debugging/appender SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/poormanr/.m2/repository/replaced-jar-name.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/poormanr/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.20.0/log4j-slf4j-impl-2.20.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. DEBUG StatusConsoleListener PluginManager 'Lookup' found 18 plugins DEBUG StatusConsoleListener PluginManager 'Converter' found 45 plugins DEBUG StatusConsoleListener Starting OutputStreamManager SYSTEM_OUT.false.false-3 DEBUG StatusConsoleListener Starting LoggerContext[name=Default, org.apache.logging.log4j.core.LoggerContext@5524cca1]... DEBUG StatusConsoleListener Reconfiguration started for context[name=Default] at URI null (org.apache.logging.log4j.core.LoggerContext@5524cca1) with optional ClassLoader: null DEBUG StatusConsoleListener Using configurationFactory org.apache.logging.log4j.core.config.ConfigurationFactory$Factory@5032714f DEBUG StatusConsoleListener PluginManager 'Lookup' found 18 plugins DEBUG StatusConsoleListener Closing BufferedInputStream java.io.BufferedInputStream@353352b6 DEBUG StatusConsoleListener Watching configuration 'replaced_value/log4j2.xml' for lastModified Fri May 05 07:49:44 CDT 2023 (1683290984299) DEBUG StatusConsoleListener null null initializing configuration XmlConfiguration[location=replaced_value/log4j2.xml] DEBUG StatusConsoleListener PluginManager 'Core' found 134 plugins DEBUG StatusConsoleListener PluginManager 'Level' found 0 plugins DEBUG StatusConsoleListener PluginManager 'Lookup' found 18 plugins DEBUG StatusConsoleListener Building Plugin[name=CustomLevel, class=org.apache.logging.log4j.core.config.CustomLevelConfig]. DEBUG StatusConsoleListener createLevel(name="CUSTOM_LEVEL", intLevel="250") DEBUG StatusConsoleListener Creating CustomLevel(name='CUSTOM_LEVEL', intValue=250) DEBUG StatusConsoleListener Building Plugin[name=CustomLevels, class=org.apache.logging.log4j.core.config.CustomLevels]. DEBUG StatusConsoleListener createCustomLevels(={CustomLevel[name=CUSTOM_LEVEL, intLevel=250]}) DEBUG StatusConsoleListener Building Plugin[name=layout, class=org.apache.logging.log4j.core.layout.PatternLayout]. DEBUG StatusConsoleListener PatternLayout$Builder(pattern="%m%n", PatternSelector=null, Configuration(replaced_value/log4j2.xml), Replace=null, charset="null", alwaysWriteExceptions="null", disableAnsi="null", noConsoleNoAnsi="null", header="null", footer="null") DEBUG StatusConsoleListener PluginManager 'Converter' found 45 plugins DEBUG StatusConsoleListener Building Plugin[name=appender, class=com.kdgregory.log4j2.aws.CloudWatchAppender]. DEBUG StatusConsoleListener CloudWatchAppender$CloudWatchAppenderBuilder(name="CLOUDWATCH", logGroup="/debugging/appender", logStream="LOGS-{date}-{hostname}-43482", retentionPeriod="7", dedicatedWriter="null", PatternLayout(%m%n), Filter=null, Configuration(replaced_value/log4j2.xml), synchronous="true", batchDelay="null", truncateOversizeMessages="null", discardThreshold="null", discardAction="null", assumedRole="null", clientFactory="null", clientRegion="null", clientEndpoint="null", useShutdownHook="null", initializationTimeout="null", enableBatchLogging="null") DEBUG StatusConsoleListener Building Plugin[name=appenders, class=org.apache.logging.log4j.core.config.AppendersPlugin]. DEBUG StatusConsoleListener createAppenders(={com.kdgregory.log4j2.aws.CloudWatchAppender with name CloudWatchAppender}) DEBUG StatusConsoleListener Building Plugin[name=AppenderRef, class=org.apache.logging.log4j.core.config.AppenderRef]. DEBUG StatusConsoleListener createAppenderRef(ref="CLOUDWATCH", level="null", Filter=null) DEBUG StatusConsoleListener Building Plugin[name=logger, class=org.apache.logging.log4j.core.config.LoggerConfig]. DEBUG StatusConsoleListener LoggerConfig$Builder(additivity="false", level="CUSTOM_LEVEL", levelAndRefs="null", name="com.replaced.value.wrapper", includeLocation="null", ={CLOUDWATCH}, ={}, Configuration(replaced_value/log4j2.xml), Filter=null) DEBUG StatusConsoleListener Building Plugin[name=loggers, class=org.apache.logging.log4j.core.config.LoggersPlugin]. DEBUG StatusConsoleListener createLoggers(={com.replaced.value.wrapper}) WARN StatusConsoleListener No Root logger was configured, creating default ERROR-level Root logger with Console appender DEBUG StatusConsoleListener Starting OutputStreamManager SYSTEM_OUT.false.false-4 DEBUG StatusConsoleListener Configuration XmlConfiguration[location=replaced_value/log4j2.xml] initialized DEBUG StatusConsoleListener Starting configuration XmlConfiguration[location=replaced_value/log4j2.xml] DEBUG StatusConsoleListener Log4j2 ConfigurationScheduler starting 1 threads DEBUG StatusConsoleListener log writer starting (thread: main) DEBUG StatusConsoleListener checking for existence of CloudWatch log group: /debugging/appender DEBUG StatusConsoleListener creating CloudWatch log group: /debugging/appender DEBUG StatusConsoleListener setting retention period to: 7 DEBUG StatusConsoleListener checking for existence of CloudWatch log stream: LOGS-20230505-c02xg275jg5m-43482 DEBUG StatusConsoleListener creating CloudWatch log stream: LOGS-20230505-c02xg275jg5m-43482 DEBUG StatusConsoleListener log writer initialization complete (thread: main) DEBUG StatusConsoleListener Started configuration XmlConfiguration[location=replaced_value/log4j2.xml] OK. DEBUG StatusConsoleListener Shutting down OutputStreamManager SYSTEM_OUT.false.false-3 DEBUG StatusConsoleListener OutputStream closed DEBUG StatusConsoleListener Shut down OutputStreamManager SYSTEM_OUT.false.false-3, all resources released: true DEBUG StatusConsoleListener Appender DefaultConsole-3 stopped with status true DEBUG StatusConsoleListener Stopped org.apache.logging.log4j.core.config.DefaultConfiguration@1a75e76a OK DEBUG StatusConsoleListener Registering MBean org.apache.logging.log4j2:type=251a69d7 DEBUG StatusConsoleListener Registering MBean org.apache.logging.log4j2:type=251a69d7,component=StatusLogger DEBUG StatusConsoleListener Registering MBean org.apache.logging.log4j2:type=251a69d7,component=ContextSelector DEBUG StatusConsoleListener Registering MBean org.apache.logging.log4j2:type=251a69d7,component=Appenders,name=DefaultConsole-1 DEBUG StatusConsoleListener org.apache.logging.log4j.core.util.SystemClock does not support precise timestamps. DEBUG StatusConsoleListener Reconfiguration complete for context[name=Default] at URI replaced_value/log4j2.xml (org.apache.logging.log4j.core.LoggerContext@5524cca1) with optional ClassLoader: null DEBUG StatusConsoleListener Shutdown hook enabled. Registering a new one. DEBUG StatusConsoleListener LoggerContext[name=Default, org.apache.logging.log4j.core.LoggerContext@5524cca1] started OK. DEBUG StatusConsoleListener using existing CloudWatch log group: /debugging/appender DEBUG StatusConsoleListener checking for existence of CloudWatch log stream: LOGS-20230505-c02xg275jg5m-43482 DEBUG StatusConsoleListener using existing CloudWatch log stream: LOGS-20230505-c02xg275jg5m-43482 DEBUG StatusConsoleListener log writer initialization complete (thread: main) DEBUG StatusConsoleListener Started configuration XmlConfiguration[location=replaced_value/log4j2.xml] OK. DEBUG StatusConsoleListener Shutting down OutputStreamManager SYSTEM_OUT.false.false-1 DEBUG StatusConsoleListener OutputStream closed DEBUG StatusConsoleListener Shut down OutputStreamManager SYSTEM_OUT.false.false-1, all resources released: true DEBUG StatusConsoleListener Appender DefaultConsole-1 stopped with status true DEBUG StatusConsoleListener Stopped org.apache.logging.log4j.core.config.DefaultConfiguration@29626d54 OK DEBUG StatusConsoleListener Registering MBean org.apache.logging.log4j2:type=251a69d7 DEBUG StatusConsoleListener Registering MBean org.apache.logging.log4j2:type=251a69d7,component=StatusLogger DEBUG StatusConsoleListener Registering MBean org.apache.logging.log4j2:type=251a69d7,component=ContextSelector DEBUG StatusConsoleListener Registering MBean org.apache.logging.log4j2:type=251a69d7,component=Loggers,name=com.replaced.value.wrapper DEBUG StatusConsoleListener Registering MBean org.apache.logging.log4j2:type=251a69d7,component=Appenders,name=CLOUDWATCH DEBUG StatusConsoleListener Registering MBean org.apache.logging.log4j2:type=251a69d7,component=Appenders,name=DefaultConsole-2 DEBUG StatusConsoleListener Reconfiguration complete for context[name=251a69d7] at URI replaced_value/log4j2.xml (org.apache.logging.log4j.core.LoggerContext@4d098f9b) with optional ClassLoader: null DEBUG StatusConsoleListener Shutdown hook enabled. Registering a new one. DEBUG StatusConsoleListener LoggerContext[name=251a69d7, org.apache.logging.log4j.core.LoggerContext@4d098f9b] started OK.

Process finished with exit code 0

RyanPoorman commented 1 year ago

The eventual goal is to deploy it in a AWS Lambda, but the issue occurs when deploying as a executable jar and in the unit tests. The log group and log stream will get created, but the message is never received.

RyanPoorman commented 1 year ago

One possibly important bit of additional information is that the appender is being included as a part of a library that I then use as a dependency. It is built with a maven shade plugin. So it looks like this: -> appender used in my library as a dependency -> My Library used as a dependency in a sample app

My shaded plugin looks as follows:

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.8.1</version>
            <configuration>
                <source>11</source>
                <target>11</target>
            </configuration>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>3.2.4</version>
            <configuration>
                <createDependencyReducedPom>false</createDependencyReducedPom>
                <filters>
                    <filter>
                        <artifact>*:*</artifact>
                        <excludes>
                            <exclude>META-INF/*.SF</exclude>
                            <exclude>META-INF/*.DSA</exclude>
                            <exclude>META-INF/*.RSA</exclude>
                        </excludes>
                    </filter>
                </filters>
                <transformers>
                    <transformer
                            implementation="io.github.edwgiz.log4j.maven.plugins.shade.transformer.Log4j2PluginCacheFileTransformer">
                    </transformer>
                </transformers>

            </configuration>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                </execution>
            </executions>
            <dependencies>
                <dependency>
                    <groupId>io.github.edwgiz</groupId>
                    <artifactId>log4j-maven-shade-plugin-extensions</artifactId>
                    <version>2.20.0</version>
                </dependency>
            </dependencies>
        </plugin>
    </plugins>
</build>
kdgregory commented 1 year ago

@RyanPoorman

I was able to reproduce by running on a t2.nano instance.

What's happening is that there's a 2 second timeout for sending a batch, which includes retrieving the sequence token if necessary. On a low-CPU instance this retrieve takes more than 2 seconds (there could be many reasons, ranging from time to set up the retry Lambda, or processing the request and response).

I want to think about the right way to solve this: either increasing the send timeout, or adding a separate timeout for retrieve. And I need to think about whether this affects other writers. I don't particularly like increasing the overall send timeout, because it could mean that the writer gets backed-up irrecoverably. Maybe I make it configurable (like the initialization timeout), although that will take longer to implement.

Will let you know when I release an update.

kdgregory commented 1 year ago

After some more research, here's what's happening:

  1. This is a newly created log stream, which means that its sequence token is null.
  2. For some reason that I don't understand, Log4J reconfigures itself after the initial configuration. I think that this may be caused by disabling the root logger, but am not sure. In any case, the practical effect is that on the second time through, the appender thinks that it's an existing stream. This causes it to attempt to retrieve the sequence token when it sends a batch of events.
  3. There's a retry loop when retrieving the sequence token. Because it's a new stream, this retry loop runs until timeout (because the sequence token is always null), and then fails.
  4. sendBatch() doesn't catch unexpected exceptions. When running in asynchronous mode, there's an uncaught exception handler attached to the thread, so it happens silently. In syncronous mode, the exception bubbles up to the caller. In either case, the batch of events is lost (which is bad).

The kicker, however, is that sequence tokens are no longer required when writing to CloudWatch Logs. This change happened when I was working on the 3.1.0 release. At the time, I wanted to get the release out, and it's a pretty major change, so I decided not to implement.

For now, I'm going to take the easy way out again, and remove the retry loop when retrieving the sequence token. I'll also add a general exception handler.

RyanPoorman commented 1 year ago

@kdgregory Thank you for looking into this! Your analysis makes complete sense

kdgregory commented 1 year ago

@RyanPoorman

Version 3.1.1 has been released, and should be available from Maven Central later today.

Thanks for reporting this with enough details that I could track down the problem.