Closed mstoodle closed 2 years ago
@mstoodle Thank you for your efforts to determine what was wrong and summarizing that information here - we really appreciate it!
Would really love to see this change implemented as all of our pipelines broke that rely on OpenJ9 recently.
@kford-newrelic In case I can help to accelerate the "back end" of this change (thinking optimistically!) ... what testing should I run locally to help ensure the pull request is working well? I tried running ./gradlew clean build --parallel
but that ran into some failures on my laptop even without my changes so wasn't sure if that was the right set of integration tests to be running...
@mstoodle yes, those are the tests that you can run locally the most easily. There are another set of integration tests that get run when PRs are submitted but it's best to let GHA run those when the PR is submitted.
We've been making some fixes and improvements to our tests, since some have been a bit unreliable. From a practical perspective, you might just want to run before/after your proposed changes and if the outcome is the same from the tests, that might be good enough.
Not selected for next quarter's agent roadmap. Will consider for a future agent release.
Thanks for the update, @kford-newrelic .
For the record, the change needed is very minimal from a coding perspective (See https://github.com/newrelic/newrelic-java-agent/compare/main...mstoodle:ibmjava-changes). But I can understand if you'll want to introduce some additional testing on the free Semeru Runtimes binaries (available here: https://ibm.com/semeru-runtimes/downloads) that it would enable you to support, and that may be effort you aren't willing to commit to in the next quarter.
It is a shame, though, that all those Fortune 500 and other companies that rely on Semeru Runtimes to run their businesses can't use your product in the meantime. I guess not many of them are complaining about it, which means it could represent an untapped market segment for you... I would love to tell all of them how well New Relic "now" works out of the box with IBM Semeru Runtimes :) .
@mstoodle don't give up on us yet!
If it's possible to squeeze this into our next quarter's work, we certainly will. Stay tuned!
Hey @mstoodle. Looks like we built a bypass in for this code path. Can you run a current agent version and use the system property?
-Dnewrelic.try_ibm_attach=true
It will need to be a system property (not env or newrelic.yml config). This check happens before we initialize our config service.
It's an undocumented config. Would you be willing to run with that setting and provide us feedback?
This would help us determine if we should just drop that code path as you've shown in your PR or possibly we'll just document this configuration (or explore other relevant options for improving the IBM Semeru experience).
@tbradellis I tried your suggestion but I can't get it working. Not sure if I've set that flag correctly or not. I'm posting the stacktrace below to have it on the record, as I don't see any other stacktrace here.
2022-06-28T14:12:09,925+0000 [1 1] com.newrelic ERROR: Unable to start the New Relic Agent. Your application will continue to run but it will not be monitored.
java.lang.NoClassDefFoundError: com.newrelic.agent.bridge.datastore.RecordSql
at com.newrelic.agent.config.TransactionTracerConfigImpl.<init>(TransactionTracerConfigImpl.java:97) ~[newrelic.jar:7.8.0]
at com.newrelic.agent.config.TransactionTracerConfigImpl.createTransactionTracerConfigImpl(TransactionTracerConfigImpl.java:352) ~[newrelic.jar:7.8.0]
at com.newrelic.agent.config.TransactionTracerConfigImpl.createTransactionTracerConfig(TransactionTracerConfigImpl.java:345) ~[newrelic.jar:7.8.0]
at com.newrelic.agent.config.AgentConfigImpl.initTransactionTracerConfig(AgentConfigImpl.java:700) ~[newrelic.jar:7.8.0]
at com.newrelic.agent.config.AgentConfigImpl.<init>(AgentConfigImpl.java:336) ~[newrelic.jar:7.8.0]
at com.newrelic.agent.config.AgentConfigImpl.createAgentConfig(AgentConfigImpl.java:283) ~[newrelic.jar:7.8.0]
at com.newrelic.agent.config.ConfigServiceFactory.createConfigService(ConfigServiceFactory.java:38) ~[newrelic.jar:7.8.0]
at com.newrelic.agent.Agent.tryToInitializeServiceManager(Agent.java:199) [newrelic.jar:7.8.0]
at com.newrelic.agent.Agent.continuePremain(Agent.java:140) [newrelic.jar:7.8.0]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[?:?]
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[?:?]
at java.lang.reflect.Method.invoke(Unknown Source) ~[?:?]
at com.newrelic.bootstrap.BootstrapAgent.startAgent(BootstrapAgent.java:193) [newrelic.jar:7.8.0]
at com.newrelic.bootstrap.BootstrapAgent.premain(BootstrapAgent.java:130) [newrelic.jar:7.8.0]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[?:?]
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[?:?]
at java.lang.reflect.Method.invoke(Unknown Source) ~[?:?]
at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(Unknown Source) [?:?]
at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(Unknown Source) [?:?]
Caused by: java.lang.ClassNotFoundException: com.newrelic.agent.bridge.datastore.RecordSql
at jdk.internal.loader.BuiltinClassLoader.loadClass(Unknown Source) ~[?:?]
at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(Unknown Source) ~[?:?]
at java.lang.ClassLoader.loadClass(Unknown Source) ~[?:?]
... 21 more
This would really help as we're stuck on old Openj9
Just to confirm, @tbradellis : by "It will need to be a system property" you mean it needs to be specified directly on the command-line, yes?
@mstoodle It just needs to be passed as an argument to the java process (vs an application argument).
From the command line it would look like java -javaagent:/path/newrelic.jar -Dnewrelic.try_ibm_attach=true myapp.jar app.jar applicationArgs ...
However, just add the property wherever other java args are being added for the deployment config:
I often tell folks just wherever they are setting heap or the javaagent argument itself (i.e. -Xms
or -Xmx
or -javaagent:/path/newrelic.jar
) , you'd set this property in the same way.
@bogdanbrindusan I'll see if I can replicate this as soon as I can and comment here. Can probably do that in the coming week.
@mstoodle and @bogdanbrindusan and @zalacmar
I was able to replicate the java.lang.NoClassDefFoundError: com.newrelic.agent.bridge.datastore.RecordSql
and spend some time troubleshooting this. In doing so I discovered that I gave the wrong argument to workaround this. Apologies!
tldr
The system property to use is:
-Dibm_iv25688_workaround=false
It should be set false
i.e. do not use the workaround. I confirmed and was able to use the Semeru 17 runtime without any issue.
@bogdanbrindusan You are right, we are pointlessly failing on some IBM runtimes. However, there are a few more places in the code that would need to be cleaned up that are related to what was put in your branch and we may need to take a slightly different approach. Also, I'll need to do some testing and add some semeru et. al. runtimes to our integration tests.
A couple notes: As you were suggesting, this code stems from an old bug that has been resolved. https://www.ibm.com/support/pages/apar/IV25688. This bug only exists in Java 6 and 7 jdks anyway, so some of this code is completely obsolete (as our current Java Agent has dropped support for java 6 and 7).
I started working up an explainer for what the various code paths are doing and why we hit the error when we go down the workaround path. Since this is an older path that deals with changes in classloaders across jdks, I'll need to give a thorough presentation of the control flow to the team for review before ripping it out. I'll post again here for full visibility, because it's pretty interesting. For now, setting -Dibm_iv25688_workaround=false
should work.
The crux of the coding issue really was that when we go down the ibmworkaround path we never perform addBridgeJarToClassPath(inst, AGENT_BRIDGE_DATASTORE_JAR_NAME);
hence NoClassDefFound for RecordSql which is archived in the agent-bridge-datastore.jar).
Thanks everyone for bringing this up!
For visibilty:
The system property to use is:
-Dibm_iv25688_workaround=false
It should be set false
i.e. do not use the workaround. I confirmed and was able to use the Semeru 17 runtime without any issue.
There's some bad logic here: https://github.com/newrelic/newrelic-java-agent/blob/da8f5509331144d64a7ba33bf41316066e5c7f42/newrelic-agent/src/main/java/com/newrelic/agent/config/IBMUtils.java#L32
That logic error ensures any IBM 1.8 and above will take the workaround path (true)...fixing this line could resolve the issue as well without removing everything -I'd prefer removing all of it though; it is no longer relevant.
Here is where the workaround path is decided:
https://github.com/newrelic/newrelic-java-agent/blob/da8f5509331144d64a7ba33bf41316066e5c7f42/newrelic-agent/src/main/java/com/newrelic/bootstrap/BootstrapAgent.java#L170
Migrated to Jira - https://issues.newrelic.com/browse/NR-44473
Considering for Q2 engineering work.
@tbradellis thanks for looking into this and put this valuable information here.
@kford-newrelic is that Q2 of 2023? I'm not familiar with how Qs are split in NR. Unless it's meant to be Q3 of 2022
Our current gradle version is blocking us from easily integrating Semeru builds into our test suite. Long story, but we had upgrading our gradle version on the road map anyway. This would get us to 7.5.1. Gradle will work well with Semeru builds starting in 7.4. Still some work to be done here because of compileScala jobs https://github.com/newrelic/newrelic-java-agent/pull/997
I was able to force some tests to run on OpenJ9. There are some tests failures. Some look to just be implementation specific and will require some separate tests to be created specifically for OpenJ9 and excluded from other runtimes. summary for agent integration test failures here: https://github.com/newrelic/newrelic-java-agent/pull/993#issuecomment-1238571291
@bogdanbrindusan We are currently in our Q2 2023 fiscal year. The hope was to get this through right away, but these additional hurdles will push this work into Q3 -which I believe starts in Oct.
Jira CommentId: 97306 Commented by bellis:
Work has been completed on this issue.
@bogdanbrindusan and @mstoodle I've removed the IBM workaround logic. We were unable to test as fully as we'd liked. Gradle seems to have trouble with downloading IBM Semeru builds. We hoped to get our full suites running. I could tell in some of my tests that there are some special cases I'll need to address in our integration tests. Not being able to download them locally with gradle made it really cumbersome to step through.
At any rate, this is going into our 7.11 release. There's a snapshot build linked below. If you are able to run this agent and provide any feedback it would be greatly appreciated, even if a simple smoke test to confirm you don't hit any issues. https://github.com/newrelic/newrelic-java-agent/suites/8940160106/artifacts/410261834
Thanks again for bringing this to our attention.
Thank you @tbradellis !! That's fantastic news! On the gradle point, would you please take a few moments to summarize the problem you experienced in an issue at the Semeru Runtimes site so we have a record of it and can start investigating?
https://github.com/ibmruntimes/Semeru-Runtimes/issues/new
Thanks again!
Is your feature request related to a problem? Please describe.
Ref discussion here: https://discuss.newrelic.com/t/feature-idea-java-new-relic-agent-doesnt-work-with-openj9-17/170671
When IBM Semeru Runtimes Java 11 recently changed its
java.vendor
string to "IBM Corporation", the NR Java agent seems to have stopped working and apparently never supported Java 17 or any other recent IBM Java release. When I asked for more details about why the vendor string mattered, I was pointed at this repo to investigate for myself.I discovered that the
AttachLifecycleObserver
is disabled for any JVM that setsjava.vendor = "IBM Corporation"
which includes all Semeru Runtimes (Open and Certified Edition) as well as IBM's older IBM SDK for Java 7, 7.1, and 8. There isn't any reasoning for it in the code, that I could find. I'm suspicious that the reason this change was made is no longer relevant.Since the NR agent worked just fine for Java 11 before we changed the
java.vendor
string and the Eclipse OpenJ9 JVM built into IBM Semeru Runtimes uses the same source code across all Java releases (8,11,17,18), I think the NR agent should just work. That probably includes the IBM SDK for Java 8 (which also uses the Eclipse OpenJ9 JVM).Feature Description
I believe all that's needed is to remove
isAgentSafe()
from theAttachLifecycleObserver
class. I have a pull request ready to go if the project would like me to submit it.Describe Alternatives
A clear and concise description of any alternative solutions or features you've considered. Are there examples you could link us to?
(migrate to Jira)
Additional context
IBM Semeru Runtimes are the same binaries as used to be built by the AdoptOpenJDK community as "OpenJDK with Eclipse OpenJ9". When AdoptOpenJDK moved to the Eclipse Foundation, they were prevented (by contract with Oracle) from continuing to produce the OpenJDK with OpenJ9 binaries. IBM stepped forward at the time and has produced the same binaries under the name "IBM Semeru Runtimes": built with open source components and the binaries are available at no cost for everyone to use. As chief architect for IBM Java and Semeru Runtimes, I would love to see New Relic agents working smoothly with Semeru Runtimes for all our Java releases.
Priority
Really Want. We have customers who have been happily using NR with Semeru Runtimes 11 who are now broken and, while we have a workaround that involves an agent that mashes back the older
java.vendor
setting, it's a super hack.