Closed livmackintosh closed 1 year ago
I have refined the issue a little bit and will continue investigating and trying to refine the bug report. cc: @laurit
Bisected our changes and can confirm that the upgrade from 1.19.0 to 1.25.0 alone was the culprit.
Now, I'm not sure if there are some breaking changes that we are not aware of... that's the current line of investigation to try to narrow down the problem further.
Many if not all of the span spam comes from ReadOperationHandler class within spring-boot-actuator
Can you say a bit more about the trace itself? It is leaking HTTP server spans? Or some background job ones? What related frameworks/libraries are you using?
We're using Spring framework as our HTTP gateway but behind that we have Micronaut microservices however the span leaks seem to be occuring at many points in the stack but primarily within Spring. At the service level but also futher down in ReadOperationHandler.
Let me see if I can provide a minimally viable span and perhaps a list of libs that we are using .
Dependencies within Spring Boot gateway
implementation(group = "org.springframework.boot", name = "spring-boot-starter-webflux")
implementation(group = "org.springframework.boot", name = "spring-boot-starter-actuator")
implementation(group = "org.springframework.boot", name = "spring-boot-starter-validation")
implementation(group = "org.jetbrains.kotlin", name = "kotlin-reflect")
implementation(group = "org.jetbrains.kotlin", name = "kotlin-stdlib-jdk8")
implementation(group = "org.jetbrains.kotlinx", name = "kotlinx-coroutines-reactor", version = coroutinesVersion)
implementation(platform("io.arrow-kt:arrow-stack:$arrowVersion"))
implementation("io.arrow-kt:arrow-core")
implementation("io.arrow-kt:arrow-fx-stm")
implementation("io.arrow-kt:arrow-fx-coroutines")
implementation(group = "com.fasterxml.jackson.module", name = "jackson-module-kotlin", version = jacksonVersion)
implementation("io.honeycomb:honeycomb-opentelemetry-sdk:$honeycombVersion")
compileOnly("io.honeycomb:honeycomb-opentelemetry-javaagent:$honeycombVersion")
implementation("io.grpc:grpc-netty-shaded:$grpcVersion")
implementation("com.google.protobuf:protobuf-java-util:$protoVersion")
implementation(group = "io.fabric8", name = "kubernetes-client", version = k8sClientVersion) // <- k8s client for using grpc endpoints watching
runtimeOnly(group = "net.logstash.logback", name = "logstash-logback-encoder", version = logstashLogbackEncoder)
testImplementation(group = "org.springframework.boot", name = "spring-boot-starter-test")
testImplementation(group = "org.hamcrest", name = "hamcrest-all", version = hamcrestVersion)
testImplementation(group = "org.mockito", name = "mockito-core", version = mockitoVersion)
testImplementation(group = "org.mockito.kotlin", name = "mockito-kotlin", version = mockitoKotlinVersion)
testImplementation(group = "org.mockito", name = "mockito-inline", version = mockitoInlineVersion)
testImplementation(group = "io.cucumber", name = "cucumber-java", version = cucumberVersion)
testImplementation(group = "io.cucumber", name = "cucumber-junit", version = cucumberVersion)
testImplementation(group = "io.cucumber", name = "cucumber-spring", version = cucumberVersion)
testRuntimeOnly(group = "org.junit.vintage", name = "junit-vintage-engine", version = junitVersion.toString())
Versions:
val micronautVersion by extra("3.4.4")
val protoVersion by extra("3.21.12")
val grpcVersion by extra("1.50.2")
val grpcKotlinVersion by extra("1.3.0")
val reactorVersion by extra("3.4.18")
val honeycombVersion by extra("1.4.0")
val arrowVersion by extra("1.1.2")
val junitVersion by extra("5.9.1")
val testContainersVersion by extra("1.17.5")
val coroutineTestVersion by extra("1.6.4")
The full dependency tree is rather large so I will refrain from posting it for now
Potentially a dupe of interest https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/8125
@livmackintosh If you believe that what you are seeing is a bug and desire a fix for it then you need to either provide a minimal application that reproduces your issue or describe your issue in sufficient detail to allow others to reproduce it. This far the only details included are that you are seeing a lot of spans in one trace and that you are using spring, which is not enough to even guess where the issue might lie.
@laurit Completely understand. I'm providing as much information as I can given time constraints and knowledge of our stack. Started last month at a new company so lots of moving parts and not familiar with Otel internals which are rather complicated or at least not simple. Bear with me and I will follow with more information and potentially a PR ;)
Probably also related: https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/8454
I'm going to test against 09b63d22944c86d8511658f0730cbbc7087ed0b2 as well as the parent commit. If this commit introduces the problem I will close this issue and consider #8125 to trump this. The reporter of this issue kindly added a repro so it may be more helpful. I can dig further into the root cause and either make a fix or test any related PRs.
NB: We're also using reactive streams.
Closing as duped by #8125
Describe the bug
Spans are submitted the same trace ID. The only thing that has changed has been upgrading from 1.19.0 (Honeycomb 1.4.0) to 1.25.0 (Honeycomb 1.5.0).[
When I first reported this bug I had thought that enabling the add-baggage flag
-Dotel.instrumentation.logback-mdc.add-baggage=true
was causing this problem however I have bisected the change and simply changing to 1.25.0 of the agent is causing the problem.Steps to reproduce
-Dotel.instrumentation.logback-mdc.add-baggage=true
~What did you expect to see? A normal number of spans per trace within a root span
What did you see instead? Hundreds of thousands of spans and no root trace detected
What version are you using? v1.25.0
Environment Compiler: Eclipse Temurin JDK 17 OS: Ubuntu 22.04 ~Runtime (if different from JDK above): (e.g., "Oracle JRE 8u251")~ OS Alpine 3.17
Additional context We are using the Honeycomb distribution (v1.5.0) which is a thin wrapper around the OTel agent providing easy configuration for their cloud platform.
We are using auto-instrumentation
We are using
-Dotel.instrumentation.kafka.experimental-span-attributes=true