opentracing-contrib / java-specialagent

Automatic instrumentation for 3rd-party libraries in Java applications with OpenTracing.
Apache License 2.0
185 stars 46 forks source link

SpecialAgent compatibility with WebSphere 8.5.x #609

Open Xitric opened 3 years ago

Xitric commented 3 years ago

We have now been trying to get SpecialAgent working in a WebSphere deployment for the last couple days. We have identified a number of incompatibility issues, for which I will open a pull request once this is all working. However, we seem to have gotten stuck on some classloading issues, particularly in relation to the OSGi classloader used by the application server.

Here is some information on our environment:

Firstly, we noticed that ByteBuddyManager.scanRules() was failing when it came across a plugin for an unsupported version of Java. Since we use Java 7, and some plugins are created for Java 8, this prevented the successful loading of most plugins. Currently, it stops scanning as soon as a plugin fails loading, but we believe that it should rather skip that plugin and try loading the rest. And here is the change we made (original):

Class<?> agentClass;
try {
  agentClass = pluginsClassLoader.loadClass(line);
} catch (final UnsupportedClassVersionError e) {
  if (logger.isLoggable(Level.FINE))
    logger.fine("Skipping rule " + line + " due to incompatible Java version");

  continue;
}

We also noticed that our environment uses a com.ibm.ws.bootstrap.ExtClassLoader which can return null values on the call to classLoader.getURLs() in ClassLoaderMap (link). We made this small change:

return classLoader.getURLs() != null && classLoader.getURLs().length > 0 || classLoader.getParent() != null ? null : super.get(NULL);

With this in place, we can successfully achieve instrumentation of our servlets, but they fail at runtime due to the some classes not being available to the OSGi BundleLoader:

[09-03-21 12:11:02:439 CET]     FFDC Exception:java.lang.NoClassDefFoundError SourceId:com.ibm.ws.portletcontainer.ext.ExtensionHandler.processGlobalPortletFilter ProbeId:346
java.lang.NoClassDefFoundError: io.opentracing.contrib.specialagent.rule.servlet.FilterAgentIntercept
    at com.ibm.ws.portletcontainer.portletserving.filter.DefaultFilter.init(DefaultFilter.java:38)
    at com.ibm.ws.portletcontainer.portletserving.filter.PortletFilterRegistry.addPortletFilter(PortletFilterRegistry.java:156)
    at com.ibm.ws.portletcontainer.portletserving.filter.PortletFilterRegistry.analyzeFilterDocument(PortletFilterRegistry.java:282)
    at com.ibm.ws.portletcontainer.portletserving.filter.PortletFilterRegistry.registerAllPortletDocumentFilters(PortletFilterRegistry.java:214)
    at com.ibm.ws.portletcontainer.portletserving.filter.PortletFilterRegistry.<init>(PortletFilterRegistry.java:80)
    at com.ibm.ws.portletcontainer.portletserving.filter.PortletFilterRegistry.<clinit>(PortletFilterRegistry.java:68)
    at java.lang.J9VMInternals.initializeImpl(Native Method)
    at java.lang.J9VMInternals.initialize(J9VMInternals.java:235)
    at com.ibm.ws.portletcontainer.registry.portletfilterregistry.GlobalPortletFilterRegistry.addPortletFilter(GlobalPortletFilterRegistry.java:81)
    at com.ibm.ws.portletcontainer.ext.ExtensionHandler.processGlobalPortletFilter(ExtensionHandler.java:345)
    at com.ibm.ws.portletcontainer.ext.ExtensionHandler.initExtensions(ExtensionHandler.java:137)
    at com.ibm.ws.portletcontainer.runtime.PortletContainerComponentImpl.start(PortletContainerComponentImpl.java:150)
    at com.ibm.ws.runtime.component.ContainerHelper.startComponents(ContainerHelper.java:540)
    at com.ibm.wsspi.runtime.component.WsContainer.startComponents(WsContainer.java:203)
    at com.ibm.wsspi.runtime.component.WsContainer.start(WsContainer.java:194)
    at com.ibm.ws.webcontainer.component.WebContainerImpl.start(WebContainerImpl.java:274)
    at com.ibm.ws.runtime.component.ContainerHelper.startComponents(ContainerHelper.java:540)
    at com.ibm.ws.runtime.provisioning.ActivationPlanUtil.startComponents(ActivationPlanUtil.java:397)
    at com.ibm.ws.runtime.provisioning.ActivationPlanUtil.processActivationPlans(ActivationPlanUtil.java:332)
    at com.ibm.ws.runtime.provisioning.ActivationPlanUtil.processSysAppActivationPlan(ActivationPlanUtil.java:165)
    at com.ibm.ws.runtime.component.DeployedApplicationImpl.start(DeployedApplicationImpl.java:816)
    at com.ibm.ws.runtime.component.ApplicationMgrImpl.startApplication(ApplicationMgrImpl.java:795)
    at com.ibm.ws.runtime.component.CompositionUnitMgrImpl$CUInitializer$1.run(CompositionUnitMgrImpl.java:992)
    at com.ibm.ws.security.auth.ContextManagerImpl.runAs(ContextManagerImpl.java:5446)
    at com.ibm.ws.security.auth.ContextManagerImpl.runAsSystem(ContextManagerImpl.java:5662)
    at com.ibm.ws.security.core.SecurityContext.runAsSystem(SecurityContext.java:255)
    at com.ibm.ws.runtime.component.CompositionUnitMgrImpl$CUInitializer.run(CompositionUnitMgrImpl.java:997)
    at com.ibm.wsspi.runtime.component.WsComponentImpl$_AsynchInitializer.run(WsComponentImpl.java:524)
    at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1892)
Caused by: java.lang.ClassNotFoundException: io.opentracing.contrib.specialagent.rule.servlet.FilterAgentIntercept
    at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:506)
    at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:422)
    at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:410)
    at org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.java:107)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:731)
    ... 29 more

We understand that this is likely because the OSGi classloader does not look for class definitions in the bootloader, where the SpecialAgent classes are injected. However, we have been unable to resolve this issue thus far. We even tried to add the opentracing packages to the OSGi bootdelegation, as recommended for other agents (AppDynamics):

-javaagent:opentracing-specialagent-1.7.5-SNAPSHOT.jar -Dorg.osgi.framework.bootdelegation=META-INF.services,io.opentracing.*,com.ibm.*

However, this seems to have no effect, and the same exact error as above is logged. We are happy to help in resolving this issue, but at this point I think we require help in understanding the root cause of our issues.

Sorry for the long writeup, but I wanted to provide you with as much information about our process as I could. I look forward to hearing from you @malafeev @safris, or someone else!

malafeev commented 3 years ago

@Xitric SpecialAgent is not actively supported anymore. I would suggest to look at OpenTelemetry, specifically to https://github.com/open-telemetry/opentelemetry-java-instrumentation

Xitric commented 3 years ago

Unfortunately, that is not an option since they dropped support for Java 7, and we specifically need to monitor a legacy platform. Do you have any other recommendations for alternatives to SpecialAgent? We would like to avoid vendor lock-in for the backend if at all possible.

malafeev commented 3 years ago

no, I don't have any vendor-free recommendation. you can probably create manual opentracing instrumentation for your application using existing opentracing libraries (https://github.com/opentracing-contrib).

cptkng23 commented 3 years ago

Hi @Xitric .

Would it help to instrument the OSGI classloader itself, so that it tries to load classes from bootstrap?

Maybe then this link below can serve as an inspiration, and if not existing in this repo you can just copy it to your fork?

https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/classloaders/javaagent/src/main/java/io/opentelemetry/javaagent/instrumentation/javaclassloader/ClassLoaderInstrumentation.java

Xitric commented 3 years ago

Thanks for the tip @cptkng23.

I have actually already looked into how the OpenTelemetry javaagent and the Elastic javaagent handle OSGi classloaders. I have tried to inject similar code into the OSGi DefaultClassLoader#loadClass(), but alas even this call throws a ClassNotFoundException at runtime for all SpecialAgent classes related to rules. I have similarly tried to use other classloaders, such as delegating to the parent classloader or using the classloaders of other classes in SpecialAgent, but with similar results.

While I still have much reverse engineering and debugging to do, I am starting to form a theory on the issue. I believe that the base SpecialAgent classes that are available directly in the generated .jar file are the only classes that I can access on the classloaders. The Rule classes, that are somehow made available to the classloaders at startup, are not available to my WebSphere applications. I have tried to provide these Rule classes via other means (by placing them in a separate, fat .jar, which is loading by WebSphere at startup) just to see if it would work - and it does. Although I have only been able to load them by using the Thread.currentThread().getContextClassLoader(), and this is only a temporary solution.

During startup of SpecialAgent, I noticed that the agent "injects" its Rule classes into instances of the OSGi DefaultClassLoader. There are hundreds of entries such as this:

>>>>>>>> inject(org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader@-31b14e64)

I worry that the way IBM's classloaders work in WebSphere, these classloaders may not be readily available to my applications. But this is the part that I am still investigating. I think tomorrow I might try to inject some tracing logs into all classloaders to understand how they communicate at runtime, and perhaps pinpoint why the Rule classes are not visible in my WebSphere applications.

Any ideas are greatly appreciated!

Xitric commented 3 years ago

Finally tracked down the root cause of my problems - it turns out I was looking in the wrong place. I was so confident the issue had to be with SpecialAgent, I did not realize that it was a bug in the JVM itself! However, it took a lot of digging around before this bug would eventually reveal itself.

Source: https://www.ibm.com/support/pages/apar/IV76963

Unfortunately, this bug was never fixed in the JVM that we are required to use, so I have implemented a workaround in my fork. Specifically, I had to extract the isThreadInstrumentable inner class to its own class definition.