kamon-io / Kamon

Distributed Tracing, Metrics and Context Propagation for applications running on the JVM
https://kamon.io
Other
1.41k stars 328 forks source link

Removing Kamon from Akka cluster with rolling update #1202

Closed jillguyonnet closed 1 year ago

jillguyonnet commented 2 years ago

Hi,

My team is looking to remove Kamon from our service, which runs on an Akka cluster. We cannot perform full cluster restarts on production so we have to do rolling updates. We were able to remove Kamon.init, but removing the kanela agent produce the following kind of errors from Akka deserializer in the app logs:

Failed to deserialize message from [unknown] with serializer id [17] and manifest [d]. java.lang.IllegalArgumentException

and new nodes are unable to join the existing cluster.

We were thinking that this issue could be a result of Kamon having introduced a custom serialized message wrapper that is no longer available on the new nodes without Kamon, resulting in those errors. In that case, we could do a two-steps removal by first adding our own (de)serializer, thus allowing us to remove Kamon with a rolling update. Could you please weigh in on this and provide some details to help us resolve this?

System information: openjdk:8u202-ubuntu Scala 2.13.8 Akka 2.6.19 sbt 1.6.2 Kamon 2.1.4 (cannot upgrade to higher version, as it seems to require a full cluster restart as well)

ivantopo commented 2 years ago

Hey @jillguyonnet,

Are you using Artery or the previous Akka remoting? I was almost sure that this issue was fixed for Artery!

jillguyonnet commented 2 years ago

Hey @ivantopo 👋 Thanks for your reply.

We are using Artery Remoting.

ivantopo commented 2 years ago

When you removed the call to Kamon.init(), did you also completely remove the Kamon jars from the classpath?

PS: you might want to stop by our Discord server to chat about this!

ivantopo commented 2 years ago

Also, do you have access to the full stacktrace logged with the warning?

jillguyonnet commented 2 years ago

When you removed the call to Kamon.init(), did you also completely remove the Kamon jars from the classpath?

Yes, my understanding is sbt should take care of that.

We see the issue when removing javaAgents += "io.kamon" % "kanela-agent" % "1.0.6" from the settings in build.sbt, and the same when trying to remove everything (kamon-bundle, kanela-agent, config files).

Also, do you have access to the full stacktrace logged with the warning?

I'm currently missing the full logs from the pre-existing cluster nodes, where the issue would be. I'll try to do another run to collect those.