newrelic / newrelic-java-agent

The New Relic Java agent
Apache License 2.0
197 stars 142 forks source link

NullPointerException being thrown while trying to link an async token #2002

Open dhilpipre opened 1 month ago

dhilpipre commented 1 month ago

In certain cases where the link method is being called on the Token class, a NullPointerException is being thrown.

It appears that the problem arises when the code reaches the com.newrelic.agent.Transaction.linkTxOnThread(TokenImpl) method. In this code there is this call: oldTx.checkFinishTransaction(oldTxa, oldTxaId);

In checkFinishTransaction, it turns out that this line is called transactionTime.markTxaFinishTime(txa.getRootTracer());

The transactionTime field is null and this results in an uncaught exception to the application.

At the least, there should be a null check here.

It would also be nice to find out why the transactionTime has not been initialized when it reaches this point.

I am not able to reproduce this error as it was pointed out by a customer using the Labs Ktor instrumentation. Note that Ktor and the underlying Kotlin Coroutines are Kotlin language based if that makes a difference.

Here is a stacktrace of the error 2024-08-08 09:41:59.153 [OperatorApi] [eventLoopGroupProxy-4-6 @call-handler#670] ERROR ktor.application - Unhandled exception caught for CoroutineName(call-handler) kotlinx.coroutines.CoroutinesInternalError: Fatal exception in coroutines machinery for DispatchedContinuation[NettyDispatcher@55ab96eb, Continuation at io.sparetech.api.TypedApiKt$registerRoutes$2$1$1$1$1.invokeSuspend(TypedApi.kt:175)@468f50cd]. Please read KDoc to 'handleFatalException' method and report this incident to maintainers at kotlinx.coroutines.DispatchedTask.handleFatalException$kotlinx_coroutines_core(DispatchedTask.kt:142) at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:113) at kotlinx.coroutines.EventLoop.processUnconfinedEvent(EventLoop.common.kt:65) at kotlinx.coroutines.internal.DispatchedContinuation.resumeWith(DispatchedContinuation.kt:343) at io.ktor.util.pipeline.SuspendFunctionGun.resumeRootWith(SuspendFunctionGun.kt:146) at io.ktor.util.pipeline.SuspendFunctionGun.loop(SuspendFunctionGun.kt:120) at io.ktor.util.pipeline.SuspendFunctionGun.access$loop(SuspendFunctionGun.kt:11) at io.ktor.util.pipeline.SuspendFunctionGun$continuation$1.resumeWith(SuspendFunctionGun.kt:70) at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46) at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104) at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.ktor.server.netty.EventLoopGroupProxy$Companion.create$lambda$1$lambda$0(NettyApplicationEngine.kt:296) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: java.lang.NullPointerException: Cannot invoke "com.newrelic.agent.transaction.TransactionTimer.markTxaFinishTime(com.newrelic.agent.tracers.Tracer)" because "this.transactionTime" is null at com.newrelic.agent.Transaction.checkFinishTransaction(Transaction.java:2451) at com.newrelic.agent.Transaction.linkTxOnThread(Transaction.java:1497) at com.newrelic.agent.TokenImpl.link(TokenImpl.java:91) at com.newrelic.instrumentation.kotlin.coroutines_17.NRContinuationWrapper.resumeWith(NRContinuationWrapper.java:44) at kotlinx.coroutines.UndispatchedCoroutine.afterResume(CoroutineContext.kt:266) at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:99) at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46) at com.newrelic.instrumentation.kotlin.coroutines_17.NRContinuationWrapper.resumeWith(NRContinuationWrapper.java:47) at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46) at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104) ... 17 common frames omitted Caused by: java.lang.NullPointerException: Cannot invoke "com.newrelic.agent.transaction.TransactionTimer.markTxaFinishTime(com.newrelic.agent.tracers.Tracer)" because "this.transactionTime" is null

workato-integration[bot] commented 1 month ago

https://new-relic.atlassian.net/browse/NR-301285

jbedell-newrelic commented 1 month ago

@dhilpipre This is very curious. The transactionTime should never be null there. Is there any chance you can get us a repro app?

dhilpipre commented 1 month ago

@jbedell-newrelic I do not and haven't been able to reproduce it myself. Let me reach out to the user to see if we can find some way to reproduce it.

dhilpipre commented 1 month ago

@jbedell-newrelic The customer did not have code that they were willing to share but they did provide some errors that were in the agent logs related to inconsistent tracer where the tracer finishing was not the same as the last tracer. That might have lead to the scenario causing this problem. I think it may come from the Kotlin Coroutine instrumentation where I am using a TracerFactory in order to track a certain type of method in Kotlin. I have also think I know how it might happen and have sent an update to the customer to test. If it gets rid of the problem then I could just close this issue.

jbedell-newrelic commented 3 weeks ago

@dhilpipre Ah. Looks like this may be a duplicate of https://github.com/newrelic/newrelic-java-agent/issues/1076 then?

dhilpipre commented 3 weeks ago

@jbedell-newrelic Yes, the two do seem to be very similar as there are a bunch of inconsistent tracer messages that show up prior to the NullPointerException

2024-07-02T11:48:08,622+0200 [482499 1] com.newrelic INFO: JRE vendor Amazon.com Inc. version 17.0.8.1 2024-07-02T11:48:08,622+0200 [482499 1] com.newrelic INFO: JVM vendor Amazon.com Inc. OpenJDK 64-Bit Server VM version 17.0.8.1+8-LTS 2024-07-02T11:48:08,622+0200 [482499 1] com.newrelic INFO: OS Linux version 6.6.32-1-MANJARO arch amd64 2024-07-02T11:48:08,622+0200 [482499 1] com.newrelic INFO: New Relic Agent v8.12.0 is initializing...

2024-07-02T11:50:01,146+0200 [482499 89] com.newrelic ERROR: Tracer Debug: Inconsistent state! tracer (actual tracer popped off stack) != lastTracer (pointer to top of stack) for com.newrelic.agent.TransactionActivity@7fffffff (com.newrelic.agent.tracers.OtherRootTracer@d75d02a != com.newrelic.agent.tracers.DefaultTracer@6888c6bb) 2024-07-02T11:50:01,154+0200 [482499 89] com.newrelic ERROR: Tracer Debug: An error occurred calling Transaction.tracerFinished() for class com.newrelic.instrumentation.kotlin.coroutines_17.NRContinuationWrapper : java.lang.NullPointerException: Cannot invoke "com.newrelic.agent.Transaction.activityFailedOrIgnored(com.newrelic.agent.TransactionActivity, int)" because "this.transaction" is null : this Tracer = com.newrelic.agent.tracers.OtherRootTracer@d75d02a 2024-07-02T11:51:01,293+0200 [482499 88] com.newrelic ERROR: Tracer Debug: Inconsistent state! tracer (actual tracer popped off stack) != lastTracer (pointer to top of stack) for com.newrelic.agent.TransactionActivity@7fffffff (com.newrelic.agent.tracers.OtherRootTracer@5481ebf0 != com.newrelic.agent.tracers.DefaultTracer@20fb9e6a) 2024-07-02T11:51:01,293+0200 [482499 60] com.newrelic ERROR: Tracer Debug: Inconsistent state! tracer (actual tracer popped off stack) != lastTracer (pointer to top of stack) for com.newrelic.agent.TransactionActivity@7fffffff (com.newrelic.agent.tracers.OtherRootTracer@3e6295a9 != com.newrelic.agent.tracers.DefaultTracer@c997983) 2024-07-02T11:51:01,294+0200 [482499 60] com.newrelic ERROR: Tracer Debug: An error occurred calling Transaction.tracerFinished() for class com.newrelic.instrumentation.kotlin.coroutines_17.NRContinuationWrapper : java.lang.NullPointerException: Cannot invoke "com.newrelic.agent.Transaction.activityFailedOrIgnored(com.newrelic.agent.TransactionActivity, int)" because "this.transaction" is null : this Tracer = com.newrelic.agent.tracers.OtherRootTracer@3e6295a9 2024-07-02T11:51:01,294+0200 [482499 88] com.newrelic ERROR: Tracer Debug: An error occurred calling Transaction.tracerFinished() for class kotlinx.coroutines.internal.LimitedDispatcher$Worker : java.lang.NullPointerException: Cannot invoke "com.newrelic.agent.Transaction.activityFailedOrIgnored(com.newrelic.agent.TransactionActivity, int)" because "this.transaction" is null : this Tracer = com.newrelic.agent.tracers.OtherRootTracer@5481ebf0 2024-07-02T11:51:01,294+0200 [482499 60] com.newrelic ERROR: Tracer Debug: Inconsistent state! tracer (actual tracer popped off stack) != lastTracer (pointer to top of stack) for com.newrelic.agent.TransactionActivity@7fffffff (com.newrelic.agent.tracers.OtherRootTracer@63214695 != com.newrelic.agent.tracers.DefaultTracer@394134b2)

dhilpipre commented 3 weeks ago

One other thing that we have found in the logs are stackgtraces like this one where there is a TransactionActivity is gone error

2024-08-14T12:24:42,536+0200 [175109 147] com.newrelic.agent.instrumentation.ClassTransformerServiceImpl FINEST: createTracer(Continuation at io.ktor.client.engine.HttpClientEngine$DefaultImpls.executeWithinCallContext(HttpClientEngine.kt:100), 315, null) java.lang.IllegalStateException: TransactionActivity is gone at com.newrelic.agent.Transaction.getTransactionActivity(Transaction.java:924) ~[newrelic.jar:8.13.0] at com.newrelic.agent.instrumentation.InstrumentationImpl.createTracer(InstrumentationImpl.java:120) [newrelic.jar:8.13.0] at io.ktor.client.engine.HttpClientEngine$executeWithinCallContext$1.invokeSuspend(HttpClientEngine.kt) [ktor-client-core-jvm-2.3.11.jar:2.3.11] at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) [kotlin-stdlib-2.0.10.jar:2.0.10-release-540] at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104) [kotlinx-coroutines-core-jvm-1.8.1.jar:?] at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584) [kotlinx-coroutines-core-jvm-1.8.1.jar:?] at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:811) [kotlinx-coroutines-core-jvm-1.8.1.jar:?] at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:715) [kotlinx-coroutines-core-jvm-1.8.1.jar:?] at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:702) [kotlinx-coroutines-core-jvm-1.8.1.jar:?]

dhilpipre commented 2 days ago

One thing that I found in the log that the customer sent me was an UnsupportedOperationException. This happens when the agent is attempting to create a tracer

2024-08-29T15:43:57,594+0200 [68739 101] com.newrelic.agent.instrumentation.ClassTransformerServiceImpl FINEST: createTracer(Continuation at io.ktor.server.netty.cio.NettyHttpResponsePipeline$respondWithBigBody$2.invokeSuspend(NettyHttpResponsePipeline.kt:328), 246, null) java.lang.UnsupportedOperationException: null at java.util.AbstractList.add(AbstractList.java:153) ~[?:?] at java.util.AbstractList.add(AbstractList.java:111) ~[?:?] at com.newrelic.agent.TransactionActivity.addTracer(TransactionActivity.java:491) ~[newrelic.jar:8.14.0] at com.newrelic.agent.TransactionActivity.addTracerToStack(TransactionActivity.java:344) ~[newrelic.jar:8.14.0] at com.newrelic.agent.TransactionActivity.tracerStarted(TransactionActivity.java:312) ~[newrelic.jar:8.14.0] at com.newrelic.agent.TransactionStateImpl.tracerStarted(TransactionStateImpl.java:107) ~[newrelic.jar:8.14.0] at com.newrelic.agent.TransactionStateImpl.getTracer(TransactionStateImpl.java:39) ~[newrelic.jar:8.14.0] at com.newrelic.agent.TransactionStateImpl.getTracer(TransactionStateImpl.java:55) ~[newrelic.jar:8.14.0] at com.newrelic.agent.instrumentation.InstrumentationImpl.createTracer(InstrumentationImpl.java:126) ~[newrelic.jar:8.14.0] at io.ktor.server.netty.cio.NettyHttpResponsePipeline$respondWithBigBody$2.invokeSuspend(NettyHttpResponsePipeline.kt) ~[ktor-server-netty-jvm-2.3.11.jar:2.3.11] at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) ~[kotlin-stdlib-2.0.20.jar:2.0.20-release-360] at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104) ~[kotlinx-coroutines-core-jvm-1.8.1.jar:?] at com.newrelic.instrumentation.kotlin.coroutines_17.NRRunnable.run(NRRunnable.java:49) ~[newrelic.jar:?] at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[netty-common-4.1.112.Final.jar:4.1.112.Final] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[netty-common-4.1.112.Final.jar:4.1.112.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) ~[netty-common-4.1.112.Final.jar:4.1.112.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) ~[netty-transport-4.1.112.Final.jar:4.1.112.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994) ~[netty-common-4.1.112.Final.jar:4.1.112.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.112.Final.jar:4.1.112.Final] at io.ktor.server.netty.EventLoopGroupProxy$Companion.create$lambda$1$lambda$0(NettyApplicationEngine.kt:296) ~[ktor-server-netty-jvm-2.3.11.jar:2.3.11] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.112.Final.jar:4.1.112.Final] at java.lang.Thread.run(Thread.java:840) [?:?]

The only way that I can see that happening is that the TransactionActivity startAsyncTracerLimitCleanup() method, it is setting the tracers field to Collections.emptyList(). The list returned by emptyList would throw an UnsupportedOperationException.

Not sure what the flow is that causes this but it seems to be some weird combination of Kotlin Coroutines and Ktor because I have plenty of customers using the Coroutines instrumentation without this problem. BTW Ktor is built on top of Coroutines.