kamon-io / Kamon

Distributed Tracing, Metrics and Context Propagation for applications running on the JVM
https://kamon.io
Other
1.41k stars 327 forks source link

Nullpointer exception for pekko-instrumentation in version 1.1.0-M1 onward #1352

Open TjarkoG opened 1 month ago

TjarkoG commented 1 month ago

when testing pekko with the version 1.1.0-M1 which will be released soon we've ran into a nullpointer error when running with Kamon + kanela-instrumentation.

to Reproduce: build.sbt

name := "pekko-quickstart-scala"

version := "1.0"

scalaVersion := "2.13.14"

lazy val pekkoVersion = "1.1.0-M1"

run/fork := true

libraryDependencies ++= Seq(
  "org.apache.pekko" %% "pekko-actor-typed" % pekkoVersion,
  "org.apache.pekko" %% "pekko-http" % pekkoVersion,
  "org.apache.pekko" %% "pekko-stream" % pekkoVersion,
  "ch.qos.logback" % "logback-classic" % "1.5.6",
  "io.kamon" %% "kamon-pekko" % "2.7.3",
  "io.kamon" %% "kamon-system-metrics" % "2.7.3",
  "io.kamon" %% "kamon-prometheus" % "2.7.3"
)

Main.scala:


//#full-example
package com.example

import com.typesafe.config.ConfigFactory
import org.apache.pekko.actor.ActorSystem
import org.apache.pekko.http.scaladsl.Http
import org.apache.pekko.http.scaladsl.client.RequestBuilding.Get
import org.apache.pekko.http.scaladsl.server.Directives

object Main extends Directives {
  def main(args: Array[String]): Unit = {
    val config = ConfigFactory.load()
    implicit val untyped: ActorSystem = ActorSystem.create("Test", config)
    Http()(untyped).singleRequest(Get("https://www.google.com")).onComplete(rsp => println(rsp.map(_.status)))(untyped.dispatcher)
  }
}

when starting with -javaagent:javaagent/kanela-agent.jar (version 1.0.18)

i'm receiving the exception:

org.apache.pekko.actor.ActorInitializationException: pekko://Test/system/IO-TCP/selectors/$a: exception during creation, root cause message: [Cannot invoke "org.apache.pekko.dispatch.DispatcherPrerequisites.settings()" because the return value of "kamon.instrumentation.pekko.instrumentations.DispatcherInfo$HasDispatcherPrerequisites.dispatcherPrerequisites()" is null]
    at org.apache.pekko.actor.ActorCell.create(ActorCell.scala:679)
    at org.apache.pekko.actor.ActorCell.invokeAll$1(ActorCell.scala:523)
    at org.apache.pekko.actor.ActorCell.systemInvoke(ActorCell.scala:545)
    at org.apache.pekko.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:297)
    at org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:232)
    at kamon.instrumentation.executor.ExecutorInstrumentation$InstrumentedForkJoinPool$TimingRunnable.run(ExecutorInstrumentation.scala:800)
    at org.apache.pekko.dispatch.ForkJoinExecutorConfigurator$PekkoForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:61)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
    at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
Caused by: java.lang.reflect.InvocationTargetException: null
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
    at org.apache.pekko.util.Reflect$.instantiate(Reflect.scala:82)
    at org.apache.pekko.actor.ArgsReflectConstructor.produce(IndirectActorProducer.scala:111)
    at org.apache.pekko.actor.Props.newActor(Props.scala:236)
    at org.apache.pekko.actor.ActorCell.newActor(ActorCell.scala:626)
    at org.apache.pekko.actor.ActorCell.create(ActorCell.scala:653)
    ... 11 common frames omitted
Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.pekko.dispatch.DispatcherPrerequisites.settings()" because the return value of "kamon.instrumentation.pekko.instrumentations.DispatcherInfo$HasDispatcherPrerequisites.dispatcherPrerequisites()" is null
    at kamon.instrumentation.pekko.instrumentations.InstrumentNewExecutorServiceOnPekko$.around(DispatcherInstrumentation.scala:130)
    at kamon.instrumentation.pekko.instrumentations.InstrumentNewExecutorServiceOnPekko.around(DispatcherInstrumentation.scala)
    at org.apache.pekko.dispatch.ThreadPoolConfig$ThreadPoolExecutorServiceFactory.createExecutorService(ThreadPoolBuilder.scala)
    at org.apache.pekko.dispatch.Dispatcher$LazyExecutorServiceDelegate.executor$lzycompute(Dispatcher.scala:54)
    at org.apache.pekko.dispatch.Dispatcher$LazyExecutorServiceDelegate.executor(Dispatcher.scala:54)
    at org.apache.pekko.dispatch.ExecutorServiceDelegate.execute(ThreadPoolBuilder.scala:232)
    at org.apache.pekko.dispatch.ExecutorServiceDelegate.execute$(ThreadPoolBuilder.scala:232)
    at org.apache.pekko.dispatch.Dispatcher$LazyExecutorServiceDelegate.execute(Dispatcher.scala:53)
    at org.apache.pekko.dispatch.Dispatcher.executeTask(Dispatcher.scala:91)
    at org.apache.pekko.dispatch.MessageDispatcher.unbatchedExecute(AbstractDispatcher.scala:177)
    at org.apache.pekko.dispatch.BatchingExecutor.execute(BatchingExecutor.scala:143)
    at org.apache.pekko.dispatch.BatchingExecutor.execute$(BatchingExecutor.scala:134)
    at org.apache.pekko.dispatch.MessageDispatcher.execute(AbstractDispatcher.scala:107)
    at org.apache.pekko.util.SerializedSuspendableExecutionContext.attach(SerializedSuspendableExecutionContext.scala:92)
    at org.apache.pekko.util.SerializedSuspendableExecutionContext.execute(SerializedSuspendableExecutionContext.scala:95)
    at org.apache.pekko.io.SelectionHandler$ChannelRegistryImpl.<init>(SelectionHandler.scala:200)
    at org.apache.pekko.io.SelectionHandler.<init>(SelectionHandler.scala:324)
    ... 21 common frames omitted

i've tried to analyze the error and got to the point that kamon/instrumentation/pekko/instrumentations/DispatcherInstrumentation.scala:164 does not seem to get triggered with pekko 1.1.0-M1 anymore resulting in dispatcherPrerequisites being null in kamon/instrumentation/pekko/instrumentations/DispatcherInstrumentation.scala:125 but i cannot understand what change in pekko 1.0.3 <-> 1.1.0-M1 made that difference and how to fix it

pjfanning commented 1 month ago

Pekko 1.1.0-M1 is released. It is a milestone release but the jars are all up on Maven Central.

pjfanning commented 1 month ago

1354 PR experiment does not seem to pick up this issue. Could be a gap in test coverage. @TjarkoG could you see if you could add a unit test that reproduces this?

pjfanning commented 1 month ago

There are some changes in the Pekko dispatch code in v1.1.0-M1. I wouldn't have thought them major.

https://github.com/apache/pekko/commits/0930982b9a5625e622c4c4c2a6f71dc9917cb336/actor/src/main/scala/org/apache/pekko/dispatch

https://github.com/apache/pekko/pull/485 is the biggest change

TjarkoG commented 1 month ago

@pjfanning sry for the late response. I've looked into it but the whole instrumentation logic is sadly out of my knowledge zone. Took me way longer to gather the information i put above than im proud to admit :(

pjfanning commented 1 month ago

The strange thing is the #1354 CI tests pass for pekko. Any interest in just building your own kamon jars using the branch underlying #1354 ?

TjarkoG commented 1 month ago

i can try if this would help with our problem but im not to optimistic that it will help. the problems with the "normal" jars are also only occurring when starting with the kanela-agent for instrumentation and when using the dispatcher "pekko.io.pinned-dispatcher" from the reference.conf "pekko.io.tcp" edit: i've build the jar locally based on #1354 and tried it with the same result

TjarkoG commented 1 month ago

@ivantopo do you have any idea how to create tests for that?

pjfanning commented 1 month ago

I created https://github.com/pjfanning/kamon-pekko-http-test based on the description here and this runs fine with no issues.

TjarkoG commented 1 month ago

Did you start it with the kanela java-agent? Without it does work but we dont get the instrumentation

pjfanning commented 1 month ago

Did you start it with the kanela java-agent? Without it does work but we don't get the instrumentation

Thanks. I hadn't added kanela-agent. The issue happens with it enabled. I updated the sample project. I haven't used Kamon since its early days. I have my own fork that I use myself and it works with Pekko 1.1 (https://github.com/pjfanning/micrometer-pekko).

pjfanning commented 1 month ago

I may have messed up the test run but when I tried Scala 3.3.3 and 2.12.19, it seemed ok. Maybe this only happens with Scala 2.13.

Philippus commented 1 week ago

I'm getting this issue now with Pekko 1.1.0.

Philippus commented 1 week ago

I think it's related to inlining. This commit still works: https://github.com/apache/pekko/commit/d829637ef28c5adcc88e2e97d13c15a5b11162cf. This one doesn't: https://github.com/apache/pekko/commit/0f1db5301ed87fea39ab331b5bce88022ccb54da. And if I disable inlining with the pekko.no.inline setting and compile pekko it works. I'm using Scala 2.13.14.

pjfanning commented 1 week ago

I think it's related to inlining. This commit still works: apache/pekko@d829637. This one doesn't: apache/pekko@0f1db53. And if I disable inlining with the pekko.no.inline setting and compile pekko it works. I'm using Scala 2.13.14.

Seems plausible. Do you see the same behaviour as me - in that Scala 3 and Scala 2.12 work fine but that the NPE pops up with Scala 2.13? The Pekko inlining does not affect the Scala 3 release of Pekko.

Philippus commented 1 week ago

I think it's related to inlining. This commit still works: apache/pekko@d829637. This one doesn't: apache/pekko@0f1db53. And if I disable inlining with the pekko.no.inline setting and compile pekko it works. I'm using Scala 2.13.14.

Seems plausible. Do you see the same behaviour as me - in that Scala 3 and Scala 2.12 work fine but that the NPE pops up with Scala 2.13? The Pekko inlining does not affect the Scala 3 release of Pekko.

2.12.19 and 3.5.0 didn't give a NPE.

Philippus commented 4 days ago

Is this a bug in Kamon, or a bug in Pekko ?

pjfanning commented 4 days ago

Pekko 1.1 uses Scala 2 compiler inlining. Kamon doesn't seem to be able to find the inlined code in Pekko.

leszekgruchala commented 4 days ago

I do have the same issue with 2.13.14

pjfanning commented 4 days ago

To be clear, the Apache Pekko 1.1 releases won't be modified to suit Kamon. We reserve the right in minor releases to make some changes. The issue here is that Kanela works a bit like AOP and expects to be able the manipulate Pekko byte code to gather metrics. It looks like the instrumentation logic in Kamon - specify for Pekko dispatchers - needs to be modified to work with Pekko 1.1.

@mdedetrich just adding to you to the issue just in case you have any thoughts on this.

If anyone has any thoughts on hooks that could be added to Pekko 1.2 or 2.0 that makes it easier for Kamon to instrument Pekko then feel free to raise issues or PRs in the Pekko repo(s).

pjfanning commented 4 days ago

I created https://github.com/kamon-io/Kamon/pull/1361 as a workaround.

pjfanning commented 4 days ago

Users hitting this issue should also be able to disable dispatcher instrumentation. See https://github.com/kamon-io/Kamon/blob/124e223de4d7d806b47c76323f5d91edf626fa81/instrumentation/kamon-pekko/src/main/resources/reference.conf and check about excluding dispatchers.