opentracing-contrib / java-specialagent

Automatic instrumentation for 3rd-party libraries in Java applications with OpenTracing.
Apache License 2.0
185 stars 46 forks source link

Mule 4 isolated classloader structure is unsupported #253

Closed jam01 closed 4 years ago

jam01 commented 4 years ago

Mule 4 makes use of an HTTP Service to provide HTTP functionality. The default implementation uses Grizzly Async HTTP Client. When I directly run some of the service's tests attaching the specialagent, the Grizzly AHC rule is correctly linked and the HTTP filters added.

However when running Mule 4, either standalone or through their own FunctionalTestCases, the rule is not linked. I'm not entirely sure where the issue is. If it's to do with the ClassLoaderAgentRule not catching Mule's MuleArtifactClassLoader (where the HTTP Service lives) or if bytebuddy is not intercepting the AHC method for some reason.

This is more or less what the class loader structure looks like image. And this is the test I've been using repo, though I've been attaching the special agent through my IDEs run command.

PS. If I instead use the old Byteman agent, the rules are linked just fine.

safris commented 4 years ago

Hi @jam01, I cloned the repo you provided, and the MuleITest passes successfully. Can you edit your test to showcase the problem so that it causes a test failure? Otherwise, I don't know what I'm looking for, and don't know what to assert.

jam01 commented 4 years ago

Ah of course. I was just manually checking the logs to see if ahc had been instrumented, but I can add assertions for spans...

jam01 commented 4 years ago

I've updated the repo.

  1. I copied a test from the HttpService repo and added an assertion that looks for the AHC rule span. That passes.
  2. I added the same assertion to the existing MuleITest. That fails.

Again, I run these manually adding -javaagent to the IntelliJ run vm args. Not sure if there's a way to make the AgentRunner work in this context to make that unnecessary.

safris commented 4 years ago

Hi @jam01, I've looked into this issue, and here is my feedback: 1) You cannot import io.opentracing.contrib.specialagent:opentracing-specialagent-util or io.opentracing.contrib.specialagent:opentracing-specialagent in your pom.xml. Doing so will cause unexpected issues, because SpecialAgent is meant to be loaded via the -javaagent argument instead of the -classpath argument. You should remove this from your test. 2) The AsyncHttpClientAgentRule instruments on com.ning.http.client.AsyncHttpClientConfig$Builder. I placed a breakpoint on com.ning.http.client.AsyncHttpClientConfig.Builder.<init>(), and this breakpoint is not hit when I run HttpServiceTest. If it's not hit by the debugger, it means it's not being called during the execution of your test -- hence no span. Does this make sense? Please correct me if I'm overseeing something.

jam01 commented 4 years ago

Thanks @safris

  1. Yes! This must be a copy/paste oversight when bringing in the ot-mock and ot-util artifacts.
  2. This might be a case of "it works on my computer"

image

I'm not sure how I can show you this any other way... screen sharing?? Also, when running the HttpServiceTest there are spans

image

jam01 commented 4 years ago
Allowing instrumentation with "grizzly-ahc" due to "fingerprint.bin match" for:
file:/tmp/opentracing-specialagent14963662873980741414/META-INF/plugins/opentracing-grizzly-ahc-0.1.3.jar,
file:/tmp/opentracing-specialagent14963662873980741414/META-INF/plugins/grizzly-ahc-1.5.1.jar
[grizzly-ahc] Target class loader is: <system>jdk.internal.loader.ClassLoaders$AppClassLoader@9e89d68
[grizzly-ahc] Preload of instrumentation classes deferred
-------> Intercept from: public com.ning.http.client.AsyncHttpClientConfig$Builder()
safris commented 4 years ago

Hi @jam01, I've looked deeper into this. From my investigation so far, everything is working as it should: 1) AHC is passing fingerprint match: Allowing instrumentation with "grizzly-ahc" due to "fingerprint.bin match". 2) AHC Intercept is happening: -------> Intercept from: public com.ning.http.client.AsyncHttpClientConfig$Builder(). 3) TracingRequestFilter is being added as a request filter to the builder. 4) AsyncHandler is being wrapped by AsyncHandlerWrapper. 5) onStatusReceived() is called, and Tags.HTTP_STATUS is set. 6) onHeadersReceived() is called. 7) And finally, onComplete() is called, which thus calls span.finish().

As far as I can see, the intercepts are working correctly. I don't know the full context of "why it is specifically that the test condition you've created is not passing". Could you give me a higher layer explanation of the use-case you've created?

jam01 commented 4 years ago

Right, sorry. So there are two tests, one that tests the HTTPService (more or less copied from the Mulesoft repo) which does not involve any custom classloading. The other is a MuleFunctionalTest which makes use of the HTTPService but that does setup the custom classloading structure.

The first test passes, and the second one fails. So with these two tests I'm trying to showcase my hypothesis that the custom classloading structure is not supported.

I understand that the HttpService actually fails for you, so I've setup Github CI to try it outside of my local, and it does pass, see here: https://github.com/jam01/java-specialagent-test-mule-4.2.2/commit/2eb65db151ad1aace9a1f59cb9cdf79d634a8f85/checks?check_suite_id=344300765

In my head this is a bytebuddy issue, because when I set breakpoints where the Grizzly AHC is instantiated, it's hit with both tests but the rule intercept only works with the HttpService.

jam01 commented 4 years ago

Ah man ok... So turns out that the MuleFunctionalTest issue was to do with JDK 1.8 vs 11. If I use 1.8 then the ahc rule is linked correctly and the spans reported fine, tested that with a local Jaeger instance and it worked.

Unfortunately it is still not working with a standalone version of the mule server. There's no docs about launching an instance directly from java so I've yet to figure out the best way to provide a reproducer...

jam01 commented 4 years ago

OK! So this is not ideal but it's something... https://github.com/jam01/mule-4-agents-tests

To run the "tests" run ./test.sh

There I'm setting up some simple agents with bytebuddy and byteman, without any opentracing stuff. I'm downloading the latest trial version of mule 4 and kicking it off with each agent, one after the other. The agents should both print a message to std out if the rule was triggered. After each agent run I'm grepping the logs to check for that message.

Here you can see the last github ci run... https://github.com/jam01/mule-4-agents-tests/commit/e93fdcfe8b33b09fb14b5587aba906db6fa5c3c5/checks?check_suite_id=356379577

feel free to hit me up on gitter whenever

safris commented 4 years ago

Hi @jam01, thanks for providing this project. You are correct -- there is something different happening here. I was not able to figure this out myself, so I posted a question requesting advice.

jam01 commented 4 years ago

OK cool. Is what you have on the stack post all you changed to debug?

safris commented 4 years ago

I tried a number of other things, specifically trying to get ByteBuddy to properly resolve types that are on the bootstrap class loader. However, I wasn't able to make any tangible progress. I couldn't find the relevant information on how to do this in ByteBuddy's documentation. I'll submit a PR for your repo with my debug changes.

jam01 commented 4 years ago

So adding the fallback locator here https://github.com/jam01/mule-4-agents-tests/commit/bf348e8cb689ffa137f3c87624b6a34f723234d4 makes it work. Got that from Elastic's solution here https://github.com/elastic/apm-agent-java/pull/855/files

Is this something we'd just add to BootLoaderAgent or would you want to approach it differently?

safris commented 4 years ago

Hi @jam01, great work! I will review the changes you made to your repo, and will figure out the best compatible solution for SpecialAgent.

safris commented 4 years ago

Hi @jam01, I have committed the relevant changes to SpecialAgent. When this build completes, please download the development build and try your test again.

jam01 commented 4 years ago
SpecialAgent#linkRule("grizzly:ahc"[61], org.mule.runtime.module.artifact.api.classloader.MuleArtifactClassLoader@5dfe9cfc): compatible = null, RulePath: /tmp/opentracing-specialagent1860855459117825004/META-INF/plugins/grizzly-ahc-1.5.4-SNAPSHOT.jar
ClassFingerprint#compatible[true]("com.ning.http.client.HttpResponseBodyPart")
ClassFingerprint#compatible[true]("com.ning.http.client.RequestBuilder")
ClassFingerprint#compatible[true]("com.ning.http.client.RequestBuilderBase")
ClassFingerprint#compatible[true]("com.ning.http.client.HttpResponseHeaders")
ClassFingerprint#compatible[true]("com.ning.http.client.HttpResponseStatus")
ClassFingerprint#compatible[true]("com.ning.http.client.filter.FilterContext$FilterContextBuilder")
ClassFingerprint#compatible[true]("com.ning.http.client.AsyncHttpClientConfig$Builder")
ClassFingerprint#compatible[true]("com.ning.http.client.filter.FilterContext")
ClassFingerprint#compatible[true]("com.ning.http.client.filter.RequestFilter")
ClassFingerprint#compatible[true]("com.ning.http.client.AsyncHandler")
ClassFingerprint#compatible[true]("com.ning.http.client.Request")
Allowing instrumentation with "grizzly:ahc" due to "fingerprint.bin match" for:
file:/tmp/opentracing-specialagent1860855459117825004/META-INF/plugins/opentracing-grizzly-ahc-0.1.3.jar,
file:/tmp/opentracing-specialagent1860855459117825004/META-INF/plugins/grizzly-ahc-1.5.4-SNAPSHOT.jar
[grizzly:ahc] Target class loader is: org.mule.runtime.module.artifact.api.classloader.MuleArtifactClassLoader@5dfe9cfc
[grizzly:ahc] Preload of instrumentation classes deferred
RuleClassLoader<io.opentracing.contrib.specialagent.RuleClassLoader@78f39b10>#preLoad(org.mule.runtime.module.artifact.api.classloader.MuleArtifactClassLoader@5dfe9cfc)
...
...
-------> Intercept from: public com.ning.http.client.AsyncHttpClientConfig$Builder()

So that works!

There are still a bunch of the same type of errors like this one

Event::onError(org.mule.extension.http.internal.listener.HttpListenerProvider, org.mule.runtime.module.artifact.api.classloader.MuleArtifactClassLoader@1cb6256a, null, false)
java.lang.IllegalStateException: Cannot resolve type description for org.mule.runtime.api.connection.CachedConnectionProvider

I think the root of the issue may be that ByteBuddy uses getResourceAsStream instead of findClass to load classes. I think it's only findClass that's plugged into the class loading strategies. This may or may not be an issue later, but for now at least I can see the Grizzly AHC rule is now working.

safris commented 4 years ago

Hey @jam01, that's great that it works! Regarding the similar exceptions that remain:

With the addition of the fallback LocationStrategy, we only provide a way for ByteBuddy to locate classes that belong to the bootstrap class loader. These other classes for which the exceptions persist are not present in the bootstrap class loader. Simply put, you were right to say: ByteBuddy does not support Mule 4's isolated class loader structure.

ByteBuddy modifies bytecode. To get the bytecode that needs to be modified, ByteBuddy attempts to locate it in the class loader (and its parents) in which the rule is triggered. A class loader provides access to the bytecode of the class via the getResourceAsStream() (and other getResource* methods) -- do distinguish from findClass(), which returns the Class instance instead of the byte[].

You're right to say that "this may be an issue later". For the time being, I'll close this issue, and we can open a new one if another problem arrises.