open-telemetry / opentelemetry-java-instrumentation

OpenTelemetry auto-instrumentation and instrumentation libraries for Java
https://opentelemetry.io
Apache License 2.0
1.98k stars 869 forks source link

application start failed when muti javaagent is provided in cmd and opentelemetry agent is placed at the back #12534

Open pepeshore opened 3 weeks ago

pepeshore commented 3 weeks ago

Describe the bug

When starting an application like this java -javaagent:/path/to/another/agent.jar -javaagent:/path/to/opentelemetry-javaagent.jar -jar /path/to/app.jar, In some case, there would be an Exception like this java.lang.NoClassDefFoundError: io/opentelemetry/javaagent/instrumentation/internal/classloader/BootDelegationInstrumentation$Holder at com.taobao.csp.ahas.starter.SandboxClassLoader.loadClass(SandboxClassLoader.java:44) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520) at com.taobao.csp.ahas.starter.initializer.LogbackInitializer.init(LogbackInitializer.java:44) at com.taobao.csp.ahas.starter.AgentLauncher.launch(AgentLauncher.java:79) at com.taobao.csp.ahas.starter.AgentStarter.premain(AgentStarter.java:41) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:491) at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:503) Caused by: java.lang.ClassNotFoundException: io.opentelemetry.javaagent.instrumentation.internal.classloader.BootDelegationInstrumentation$Holder at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520) ... 11 more

different case has different exception stack but all them says that io/opentelemetry/javaagent/instrumentation/internal/classloader/BootDelegationInstrumentation$Holder can not be loaded

Steps to reproduce

I can't provide the code for the other agent and app.jar since it's not open source, but I have identified the cause of the bug.

When another agent is placed before the OpenTelemetry agent, it initializes first. This leads to the loading of a large number of classes, such as those from the Netty and OkHttp frameworks, some of which are exactly the classes the OpenTelemetry agent needs to enhance. By the time the OpenTelemetry agent initializes, it has to perform a retransform operation on these classes. Normally, it would only need to retransform some internal JDK classes, but in this scenario, it also needs to retransform the mentioned framework classes.

ByteBuddy handles this at a low level through the java.lang.instrument.Instrumentation#retransformClasses method, which is invoked once with all the classes that need to be retransformed. The issue is that if any single class fails to retransform, the subsequent classes will not be retransformed. In this particular scenario, I noticed an error occurred during the retransform of the io.netty.channel.DefaultChannelPipeline class, resulting in a VerifyError. Consequently, no retransform was performed on the SystemClassLoader (jdk.internal.loader.BuiltinClassLoader).

In summary, the opentelemetry agent successfully enhances com.taobao.csp.ahas.starter.SandboxClassLoader by io.opentelemetry.javaagent.instrumentation.internal.classloader.BootDelegationInstrumentation, but due to the reason mentioned above, the enhancement of jdk.internal.loader.BuiltinClassLoader is not triggered. When the loadClass method of com.taobao.csp.ahas.starter.SandboxClassLoader is called, it attempts to load the class BootDelegationInstrumentation$Holder. Since com.taobao.csp.ahas.starter.SandboxClassLoader is loaded by the SystemClassLoader (jdk.internal.loader.BuiltinClassLoader), it attempts to be loaded by this class loader. However, because this class loader was not enhanced, it cannot load the BootDelegationInstrumentation$Holder class, leading to an exception.

I personally believe that if the retransform on the SystemClassLoader (jdk.internal.loader.BuiltinClassLoader) is not effective, it is a significant issue that could lead to many unexpected behaviors. The proper functioning of almost all enhancements relies on this working correctly. At worst, if it doesn’t function properly, other enhancements should also not work, otherwise it could lead to application startup failures.

Expected behavior

no exception is thrown

Actual behavior

an ClassNotFound exception is thrown

Javaagent or library instrumentation version

2.9.0

Environment

JDK: OS:

Additional context

No response

pepeshore commented 3 weeks ago

In normal case, SandBoxClassLoader and BuiltinClassLoader are all instrumented by opentelemetry java agent image

when SandboxClassLoader.loadClass is invoked, class BootDelegationInstrumentation$Holder in line 5 will be trigger to loaded by BuiltinClassLoader.loadClass and line 42 will return the expected value。

In bad case, Only SandBoxClassLoader is instrumented by opentelemetry java agent image

when SandboxClassLoader.loadClass is invoked, class BootDelegationInstrumentation$Holder in line 5 will be trigger to loaded by BuiltinClassLoader.loadClass too, but this time, no class named BootDelegationInstrumentation$Holder can be found

pepeshore commented 3 weeks ago

I can fix this case by two ways

  1. place openetelemetry java agent at before
  2. at the hook method io.opentelemetry.javaagent.tooling.AgentInstaller.RedefinitionLoggingListener.onError retranform all the classes one by one to avoid one failure leads to all failures
trask commented 3 weeks ago

hi @pepeshore! I'd recommend going with (1) if that resolves your issue

you can see our general policy on supporting OpenTelemetry Java agent alongside other Java agents at https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/1534#issuecomment-812939452

laurit commented 3 weeks ago

As far as I understand the root cause of the failure is the VerifyError. Did you investigate why that happens?