Closed friscoMad closed 2 months ago
That might be a bug in the JVM, too, if it is recreating a class that was already loaded. Have you tried with the most recent JVM build?
I did test it today with 22.0.2+9 (latest version in adoptium) and I get the same error:
I will try to look into this. In order to debug, it would be necessary to scale down the batch size to avoid that multiple classes are transformed in a batch. Without, we cannot identify the class in question.
This would be needed to be added here: https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/2af49b4c044db526dd0741eccfe0bdc07baeda71/javaagent-tooling/src/main/java/io/opentelemetry/javaagent/tooling/AgentInstaller.java#L147
If you for example added .with(new RedefinitionStrategy.BatchAllocator.ForFixedSize(1))
and ran it again, it would only fail for the class in question.
If you also run with -net.bytebuddy.dump=/some/folder
you would find dumped class files in the specified folder. If you can attach the results to this tickets (before and after) for the class that is actually failing, I will be able to understand the problem.
This is the class that is failing:
[otel.javaagent 2024-08-13 18:46:56:173 +0200] [Attach Listener] DEBUG io.opentelemetry.javaagent.tooling.AgentInstaller$RedefinitionLoggingListener - Exception while retransforming 1 classes: [class io.netty.channel.DefaultChannelPipeline]
java.lang.VerifyError
at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native Method)
at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:225)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at net.bytebuddy.utility.Invoker$Dispatcher.invoke(Unknown Source)
at net.bytebuddy.utility.dispatcher.JavaDispatcher$Dispatcher$ForNonStaticMethod.invoke(JavaDispatcher.java:1032)
at net.bytebuddy.utility.dispatcher.JavaDispatcher$ProxiedInvocationHandler.invoke(JavaDispatcher.java:1162)
at net.bytebuddy.agent.builder.$Proxy29.retransformClasses(Unknown Source)
at net.bytebuddy.agent.builder.AgentBuilder$RedefinitionStrategy$Collector$ForRetransformation.doApply(AgentBuilder.java:8414)
at net.bytebuddy.agent.builder.AgentBuilder$RedefinitionStrategy$Collector.apply(AgentBuilder.java:8229)
at net.bytebuddy.agent.builder.AgentBuilder$RedefinitionStrategy.apply(AgentBuilder.java:5926)
at net.bytebuddy.agent.builder.AgentBuilder$Default.doInstall(AgentBuilder.java:11540)
at net.bytebuddy.agent.builder.AgentBuilder$Default.installOn(AgentBuilder.java:11440)
at io.opentelemetry.javaagent.tooling.AgentInstaller.installBytebuddyAgent(AgentInstaller.java:195)
at io.opentelemetry.javaagent.tooling.AgentInstaller.installBytebuddyAgent(AgentInstaller.java:102)
at io.opentelemetry.javaagent.tooling.AgentStarterImpl.start(AgentStarterImpl.java:99)
at io.opentelemetry.javaagent.bootstrap.AgentInitializer$1.run(AgentInitializer.java:53)
at io.opentelemetry.javaagent.bootstrap.AgentInitializer$1.run(AgentInitializer.java:47)
at io.opentelemetry.javaagent.bootstrap.AgentInitializer.execute(AgentInitializer.java:68)
at io.opentelemetry.javaagent.bootstrap.AgentInitializer.initialize(AgentInitializer.java:46)
at io.opentelemetry.javaagent.OpenTelemetryAgent.startAgent(OpenTelemetryAgent.java:57)
at io.opentelemetry.javaagent.OpenTelemetryAgent.agentmain(OpenTelemetryAgent.java:49)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:560)
at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallAgentmain(InstrumentationImpl.java:582)
I have uploaded the class and internal classes (original and transformed) as they are in the dump folder.
Let me know if I missed something
Thanks a lot. I will have a look!
There is no error in the class file. But it seems that the class file becomes too big and this is what likely fails the instrumentation. This cannot be easily avoided, but it is strange as this should happen to all users of OpenTelemetry in such a case. Do you know if others experience the same problem?
I think that most users will go with the suggested path of adding the OpenTelemetry agent as an external agent using the command line, we needed to attach it later due to interactions with another agent that prefers runtime attaching (Blockhound), so our usage is not what most people do. There is support for runtime attachment via this and it seems they are thinking on making it more visible in this issue but at this point I think it is a minority. But I also find interesting that it only fails if some objects are created before attaching the agent at runtime as otherwise it seems it works as expected, probably regular instrumented classes are different from retransformed classes.
Then I think this is likely an issue of agent interaction. If there are some agents that do not support retransformation, and others do, then retransformation will result in verification errors as a result of changes in the class file structure.
It is a real shame that verification errors are not displayed, I will raise a ticket to implement this.
Sorry if I did not explain it correctly, the error happens with just a single agent (OpenTelemetry) as reproduced in the example.
The official docs for open telemetry only document how to use the agent in application invocation. If you are not using runtime attachment there is no way to have pre-created objects that need to be recreated and everything works as expected.
In our application, we needed to use runtime attachment as a workaround for issues caused by multiple agent interactions, that is why we have stumbled with this error and most people have not. Run time attachment right now is only semi-supported in Open Telemetry in a contrib repo but there has been some discussion on making it more official and moving the code into the main repo.
All of this is to explain why I may be the first to report this issue. Anyway, what I get from you is that it is not fixable on bytebuddy or open telemetry side.
Would moving the attachment point earlier in the lifecycle before these objects are created help? or the only way to fix the issue is to use the agent via the command line?
OpenTelemetry uses decoration when instrumenting types: https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/2b42bc24c6cdc5cf456d32e36ba597e99b6a6c8d/javaagent-tooling/src/main/java/io/opentelemetry/javaagent/tooling/AgentInstaller.java#L145
This means that it is not technically possible to change a class file signature from the agent. As a result the verification error should not be caused by a signature change. The other option would be that there is an actual error in the class file, but after looking at it it seems valid. The third option would be that another agent changes something, but there is no other agent present. Last, it might be that the JVM does not properly resolve the class back and it only appears like there is a signature change. In this case it is a VM bug, and that is of course also possible.
I will see if I find time to debug this, but those errors are tricky and normally there is nothing to be done about it.
Can you ignore specific classes in the agent? Maybe you can suppress the instrumentation of this particular class.
As a workaround, I am moving the attach point earlier in the lifecycle, so no object that needs to be instrumented is created before the attach point and that seems to solve the problem as I have not found a way to configure properly what we can skip in the OtelAgent but also makes sense to do it as earlier as possible.
I might have found a bug while walking through all ASM visitors. Possibly, this is the cause. If you built Byte Buddy master, and then OpenTelemetry, you could try yourself. Alternatively, it will be released at some point soon.
I can confirm that while the verification error is still present it no longer causes Class not found errors with the new version. I have tested it with Java 11 and it does work with that version.
I have found this issue working with Opentelemetry agent, after some debugging it seems related to runtime attachment in a specific scenario (when some of the instrumented objects have already been created) so I am inclined to think that the error is more related to byte-buddy but it may be a problem of the usage pattern of the OTel agent.
reproducer.zip
In the reproducer, I am using the latest version of the otel agent that uses the latest bytebuddy version. The error appears when creating an object of a class that should be transformed and then attaching the agent via ByteBuddyAgent.attach(file, pid) (I am using opentelemetry-runtime-attach library for that)
It is interesting that without the extra otel agent debug logs the error is a class not found error in the application, but enabling otel agent logs shows the VerifyError while transforming classes.
It works fine in other situations:
Attaching before creating any object that needs instrumentation
Reproduced with Java 11 and Java 17
I tried to make the reproducer as small as possible but still, it has a lot of dependencies and code as I needed some target for the agent to instrument. If you think this is indeed an Otel agent issue let me know and I will report in their repo.