microsoft / ApplicationInsights-Java

Application Insights for Java
http://aka.ms/application-insights
Other
292 stars 197 forks source link

TRACE log level causes an application to fail #3524

Open progxaker opened 5 months ago

progxaker commented 5 months ago

Expected behavior

Don't fail a main application. Probably use WARN log level.

Actual behavior

The enabled JVM agent with TRACE log level causes a Java application to fail.

To Reproduce

  1. Connect a JVM agent (-javagent:/path/to/application-insights-3.4.17.jar).
  2. Provide a connection string.
  3. Set an environment variable APPLICATIONINSIGHTS_SELF_DIAGNOSTICS_LEVEL to TRACE.
  4. Run a Java application.

System information

Please provide the following information:

Logs

...
2024-01-26 17:09:03.3392 ERROR c.m.applicationinsights.agent - Error related to the execution of [keytool, -list, -rfc, -keystore, /opt/java/jdk-17.0.9+9-jre/lib/security/cacerts]
java.io.IOException: Cannot run program "keytool": error=2, No such file or directory
        at java.base/java.lang.ProcessBuilder.start(Unknown Source)
        at java.base/java.lang.ProcessBuilder.start(Unknown Source)
        at com.microsoft.applicationinsights.agent.internal.init.AppInsightsCertificate.executeWithoutException(AppInsightsCertificate.java:96)
        at com.microsoft.applicationinsights.agent.internal.init.AppInsightsCertificate.loadCertificates(AppInsightsCertificate.java:83)
        at com.microsoft.applicationinsights.agent.internal.init.AppInsightsCertificate.isInJavaKeystore(AppInsightsCertificate.java:74)
        at com.microsoft.applicationinsights.agent.internal.init.FirstEntryPoint.init(FirstEntryPoint.java:131)
        at io.opentelemetry.javaagent.tooling.AgentStarterImpl.start(AgentStarterImpl.java:95)
        at io.opentelemetry.javaagent.bootstrap.AgentInitializer$1.run(Agent Initializer.java:53)
        at io.opentelemetry.javaagent.bootstrap.AgentInitializer$1.run(AgentInitializer.java:47)
        at io.opentelemetry.javaagent.bootstrap.AgentInitializer.execute(AgentInitializer.java:64)
        at io.opentelemetry.javaagent.bootstrap.AgentInitializer.initialize(AgentInitializer.java:46)
        at io.opentelemetry.javaagent.OpenTelemetryAgent.startAgent(OpenTelemetryAgent.java:57)
        at io.opentelemetry.javaagent.OpenTelemetryAgent.premain(OpenTelemetryAgent.java:45)
        at com.microsoft.application insights.agent.Agent.premain(Agent.java:38)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(Unknown Source)
        at java.instrument/sun. instrument. InstrumentationImpl.loadClassAndCallPremain(Unknown Source)
Caused by: java.io.IOException: error=2, No such file or directory
        at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
        at java.base/java.lang.ProcessImpl.<init>(Unknown Source)
        at java.base/java.lang. ProcessImpl.start(Unknown Source)
        ... 20 common frames omitted
2024-01-26 17:09:03.340Z TRACE c.m.application insights.agent - Application Insights root certificate in the Java keystore: false
...
trask commented 5 months ago

hi @progxaker! the error you mention is only logged and shouldn't cause any other issues. can you provide the full applicationinsights.log, and also include what exception your application startup is failing with? thx

progxaker commented 5 months ago

Hi @trask. My bad, the error confused me and it's not related to the main problem.

shouldn't cause any other issues

That's what I thought too, but the main code doesn't execute.

can you provide the full applicationinsights.log

applicationinsights.log

what exception your application startup is failing with?

No exceptions at all :/ Latest logs:

2024-01-30 12:33:23.463Z INFO  c.m.applicationinsights.agent - Java version: 17.0.10, vendor: Red Hat, Inc., home: /usr/lib/jvm/java-17-openjdk-17.0.10.0.7-2.el8.x86_64
/usr/lib/jvm/java-17/bin/java -cp /tmp/spark/conf/:/tmp/spark/jars/* -Xmx1g -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false org.apache.spark.deploy.SparkSubmit --master local[4] --conf spark.jars.ivy=/tmp/.ivy --class SimpleApp target/simple-project-1.0.jar
2024-01-30 12:33:23.513Z DEBUG i.o.s.m.export.PeriodicMetricReader - No metric data to export - skipping export.

I've prepared a Dockerfile to reproduce the problem more easily:

SimpleApp.java ```java import org.apache.spark.sql.SparkSession; import org.apache.spark.sql.Dataset; public class SimpleApp { public static void main(String[] args) { String logFile = "/tmp/spark/README.md"; // Should be some file on your system SparkSession spark = SparkSession.builder().appName("Simple Application").getOrCreate(); Dataset logData = spark.read().textFile(logFile).cache(); long length = logData.count(); System.out.println("Lines: " + length); spark.stop(); } } ```
pom.xml ```xml edu.berkeley simple-project 4.0.0 Simple Project jar 1.0 17 17 org.apache.spark spark-sql_2.12 3.5.0 provided ```
Dockerfile ```dockerfile FROM registry.access.redhat.com/ubi8/openjdk-17:1.18-2.1705573234 USER 0 WORKDIR /tmp/ RUN microdnf -y install gzip procps RUN curl -fsSLo /tmp/spark.tgz https://dlcdn.apache.org/spark/spark-3.5.0/spark-3.5.0-bin-hadoop3.tgz RUN tar -C /tmp/ -xzf spark.tgz RUN mv /tmp/spark-3.5.0-bin-hadoop3 /tmp/spark/ RUN curl -fsSLo /tmp/spark/jars/applicationinsights-agent.jar https://repo1.maven.org/maven2/com/microsoft/azure/applicationinsights-agent/3.4.19/applicationinsights-agent-3.4.19.jar WORKDIR /tmp/project/ COPY SimpleApp.java ./src/main/java/SimpleApp.java COPY pom.xml ./pom.xml RUN mvn package ENV APPLICATIONINSIGHTS_SELF_DIAGNOSTICS_LEVEL="TRACE" ENV JAVA_TOOL_OPTIONS="-javaagent:/tmp/spark/jars/applicationinsights-agent.jar" ENV CLASSPATH="/tmp/spark/jars/" ENV APPLICATIONINSIGHTS_CONNECTION_STRING="InstrumentationKey=00000000-0000-0000-0000-0FEEDDADBEEF;IngestionEndpoint=http://host.testcontainers.internal:6060/;LiveEndpoint=http://host.testcontainers.internal:6060/" CMD ["/tmp/spark/bin/spark-submit", "--class", "SimpleApp", "--master", "local[4]", "--conf", "spark.jars.ivy=/tmp/.ivy", "target/simple-project-1.0.jar"] ```
progxaker commented 4 months ago

While I'm trying to solve a higher priority problem via the CET, I tried to print the OpenTelemetry logs bypassing your configuration: https://github.com/microsoft/ApplicationInsights-Java/blob/80f6eafba12454c9b59cc6f9e474e15d3c33c4fd/agent/agent-tooling/src/main/java/com/microsoft/applicationinsights/agent/internal/init/LoggingLevelConfigurator.java#L72-L76 and found that if the loggerLevel value is changed to Level.TRACE the problem is reproduced, if I set Level.INFO the problem is solved. I hope this may be helpful.

- loggerLevel = getAtLeastInfoLevel(level);
+ loggerLevel = Level.TRACE;
- loggerLevel = getAtLeastInfoLevel(level);
+ loggerLevel = Level.INFO;

UPD: If add -Dotel.sdk.disabled=true, the problem also went away.

progxaker commented 4 months ago

Keep digging deeper into it.

TL;DR The problem occurs when the logger name starts with io.opentelemetry.sdk.metrics but no deeper then io.opentelemetry.sdk.metrics.internal.

I get a list of configured loggers

Code block ```java // Originally published at https://mailman.qos.ch/pipermail/logback-user/2008-November/000751.html // and modified by me to return a string. import org.slf4j.LoggerFactory; String findNamesOfConfiguredAppenders() { LoggerContext lc = (LoggerContext) LoggerFactory.getILoggerFactory(); String strList = ""; for (Logger log : lc.getLoggerList()) { strList = strList + ";" + log.getName(); } return strList; } ``` added the function before https://github.com/microsoft/ApplicationInsights-Java/blob/80f6eafba12454c9b59cc6f9e474e15d3c33c4fd/agent/agent-tooling/src/main/java/com/microsoft/applicationinsights/agent/internal/init/LoggingLevelConfigurator.java#L75 ```diff + System.out.println("Loggers" + findNamesOfConfiguredAppenders()); loggerLevel = getDefaultLibraryLevel(level); ``` and get all the loggers from a log file. ``` # grep 'Loggers: ' /tmp/temp.log | uniq | sed 's/;/\n/g' Loggers: ROOT com com.azure com.azure.core com.azure.core.util com.azure.core.util.CoreUtils com.azure.monitor com.azure.monitor.opentelemetry com.azure.monitor.opentelemetry.exporter com.azure.monitor.opentelemetry.exporter.implementation com.azure.monitor.opentelemetry.exporter.implementation.utils com.azure.monitor.opentelemetry.exporter.implementation.utils.HostName com.azure.monitor.opentelemetry.exporter.implementation.utils.PropertyHelper io io.grpc io.grpc.Context io.netty io.netty.handler io.netty.handler.ssl io.opentelemetry io.opentelemetry.exporter io.opentelemetry.exporter.logging io.opentelemetry.javaagent io.opentelemetry.javaagent.tooling io.opentelemetry.javaagent.tooling.VersionLogger io.opentelemetry.sdk io.opentelemetry.sdk.autoconfigure io.opentelemetry.sdk.metrics io.opentelemetry.sdk.metrics.internal io.opentelemetry.sdk.metrics.internal.state io.opentelemetry.sdk.metrics.internal.state.DeltaMetricStorage io.opentelemetry.sdk.resources oshi reactor reactor.netty reactor.util ```

and started experimenting with io.opentelemetry.sdk.autoconfigure, io.opentelemetry.sdk.metrics and io.opentelemetry.sdk.resources. As a result, I can set Level.All for the autoconfigure and resources loggers, but if I set it for the metrics one, the problem is reproduced.

LoggingLevelConfigurator.java ```diff @@ -33,6 +36,13 @@ public class LoggingLevelConfigurator { updateLoggerLevel(loggerContext.getLogger("io.opentelemetry.exporter.logging")); updateLoggerLevel( loggerContext.getLogger("io.opentelemetry.sdk.metrics.internal.state.DeltaMetricStorage")); + updateLoggerLevel(loggerContext.getLogger("io.opentelemetry.sdk.autoconfigure")); + updateLoggerLevel(loggerContext.getLogger("io.opentelemetry.sdk.resources")); + updateLoggerLevel(loggerContext.getLogger("io.opentelemetry.sdk.extension")); + updateLoggerLevel(loggerContext.getLogger("io.opentelemetry.sdk.metrics")); + updateLoggerLevel(loggerContext.getLogger("io.opentelemetry.sdk.metrics.internal")); + //updateLoggerLevel(loggerContext.getLogger("io.opentelemetry.sdk.metrics.internal.state")); + updateLoggerLevel(loggerContext.getLogger("io.opentelemetry.sdk")); updateLoggerLevel(loggerContext.getLogger("io.opentelemetry")); updateLoggerLevel(loggerContext.getLogger("muzzleMatcher")); updateLoggerLevel( @@ -69,10 +80,26 @@ public class LoggingLevelConfigurator { loggerLevel = level; else if (name.equals("io.opentelemetry.sdk.metrics.internal.state.DeltaMetricStorage")) { loggerLevel = getDeltaMetricStorageLoggerLevel(level); + } else if (name.startsWith("io.opentelemetry.sdk.autoconfigure")) { + loggerLevel = Level.ALL; + } else if (name.startsWith("io.opentelemetry.sdk.resources")) { + loggerLevel = Level.ALL; + } else if (name.startsWith("io.opentelemetry.sdk.extension")) { + loggerLevel = Level.ALL; + //} else if (name.startsWith("io.opentelemetry.sdk.metrics.internal.state")) { + //loggerLevel = Level. INFO; + } else if (name.startsWith("io.opentelemetry.sdk.metrics.internal")) { + loggerLevel = Level. ERROR: + } else if (name.startsWith("io.opentelemetry.sdk.metrics")) { + loggerLevel = Level.ALL; + } else if (name.startsWith("io.opentelemetry.sdk")) { + loggerLevel = Level.ALL; + } else if (name.startsWith("io.opentelemetry")) { // OpenTelemetry instrumentation debug log has lots of things that look like errors // which has been confusing customers, so only enable it when user configures "trace" - level loggerLevel = getDefaultLibraryLevel(level); + //loggerLevel = getDefaultLibraryLevel(level); + loggerLevel = Level.ALL; } else if (name.equals("com.azure.core.implementation.jackson.MemberNameConverterImpl")) { // never want to log at trace or debug, as it logs confusing stack trace that // looks like error but isn't ```
trask commented 4 months ago

hi @progxaker, can you provide a runnable repro? (as a github repository would be easiest for us to consume and try it out)

progxaker commented 4 months ago

Hi @trask. I've consolidated the shared files (LoggingLevelConfigurator.java and the application's ones) in the progxaker/reproduce-trace-issues repository.

P.S. I tested the workaround for v3.4.19, but it doesn't work for v3.5.0.

progxaker commented 4 months ago

As for v3.5.0, the problem occurs not only with the TRACE log level, but also with DEBUG.

trask commented 3 months ago

hi @progxaker, I tried to run your repro locally but got error b/c I'm running on Windows. is there any chance you could simplify the repro, maybe along the lines of https://github.com/trask/apache-spark-extension-test? that would also make it easier for us to run in Intellij under a debugger (instead of having to remote debug inside of a docker image). thanks!

progxaker commented 3 months ago

Hi @trask. I hope you're doing well.

Sorry for the long delay, been busy implementing OpenTelemetry (Python) into a project.

TL;DR: The problem is reproduced when setting the JAVA_TOOL_OPTIONS variable with the value -javaagent:<absolute-path-to-the-agent>.


  1. I tried to reproduce the problem on Windows as you asked. As a result, the agent and application worked correctly. But in my attempts to run Spark, I discarded the JAVA_TOOL_OPTIONS variable and used the spark.driver.extraJavaOptions parameter instead. Actually both of them gave the following error, but I got the final when using spark.driver.extraJavaOptions (I removed the tmpdir parameter and ran from the folder with the JVM agent).
InvalidPathException exception ``` # spark-submit --conf "spark.driver.extraJavaOptions='-Djava.io.tmpdir=C:\Users\progxaker\Downloads\temp\ -javaagent:C:\Users\progxaker\Downloads\applicationinsights-agent-3.5.1.jar'" --class SimpleApp --master local[4] C:\Users\progxaker\Downloads\simple-project-1.0.jar Exception in thread "main" java.nio.file.InvalidPathException: Illegal char <:> at index 31: C:UsersprogxakerDownloadstemp -javaagent:C:UsersDownloadsapplicationinsights-agent-3.5.1.jar\spark-fb7f738b-cffc-44a3-9e04-2b76598eaf30 at java.base/sun.nio.fs.WindowsPathParser.normalize(WindowsPathParser.java:182) at java.base/sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:153) at java.base/sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:77) at java.base/sun.nio.fs.WindowsPath.parse(WindowsPath.java:92) at java.base/sun.nio.fs.WindowsFileSystem.getPath(Windows FileSystem.java:229) at java.base/java.io.File.toPath(File.java:2329) at org.apache.spark.network.util.JavaUtils.createDirectory (JavaUtils.java:396) at org.apache.spark.util.SparkFileUtils.createDirectory (SparkFileUtils.scala:95) at org.apache.spark.util.SparkFileUtils.createDirectory$ (SparkFileUtils.scala:94) at org.apache.spark.util.Utils$.createDirectory (Utils.scala:94) at org.apache.spark.util.Utils$.createTempDir(Utils.scala:241) at org.apache.spark.util.SparkFileUtils.createTempDir(SparkFileUtils.scala:103) at org.apache.spark.util.SparkFileUtils.createTempDir$ (SparkFileUtils.scala:102) at org.apache.spark.util.Utils$.createTempDir(Utils.scala:94) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment (SparkSubmit.scala:372) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:964) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit (SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ```
  1. Tried using the Java 11 Docker image. Application failed.
  2. Tried using the original OpenJDK Docker image - application failed.
  3. Tried using the Ubuntu Docker image and installing OpenJDK - application failed.
  4. Tried using the Ubuntu Docker image and installing Temurin JDK, since I noticed it was installed on Windows - application failed.
  5. I tried to reproduce the problem in a Linux VM. Worked as expected)
  6. And at this point I noticed that I hadn't set JAVA_TOOL_OPTIONS on the Linux VM... I did - the problem was reproduced.
  7. Tried on my Docker image - if I unset the variable, everything works, but if I return it, it doesn't.
  8. Tried it on Windows - with the Spark parameter works, without it too, but if I set JAVA_TOOL_OPTIONS, for some reason (and only on Windows) I get the following error:
    # SET JAVA_TOOL_OPTIONS=-javaagent:applicationinsights-agent-3.5.1.jar
    # spark-submit --class SimpleApp --master local[4] C:\Users\progxaker\Downloads\simple-project-1.0.jar
    Picked up JAVA_TOOL_OPTIONS: "-javaagent:applicationinsights-agent-3.5.1.jar"
    '2024-04-01' is not recognized as an internal or external command,
    operable program or batch file.

As a result, the problem is reproducible on


I'll prepare a Windows version by the end of the day (~5 p.m. UTC).

progxaker commented 3 months ago

https://github.com/progxaker/reproduce-trace-issues/#windows

progxaker commented 1 month ago

Hello @trask. Have you had time to look at this?

progxaker commented 2 weeks ago

Hello @trask. Are there any updates?