Open michaeljohnalbers opened 1 year ago
Hey @michaeljohnalbers ,
Can you check whether this issue was solved by https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/8815? You could try running your app with the latest javaagent snapshot version
@mateuszrzeszutek It doesn't look like the change has helped. In fact, things appear to have gotten a little worse with the snapshot version. See the attached screenshot. The new snapshot was deployed to our production environment between 9:30 and 9:45 on the time scale. The counts shouldn't be lower than before the deployment.
I used the 1.28.0-20230628.113740-37 snapshot version
Unless you strictly control the input volume to your service, I don't think that a single graph here suggests that the problem remains or got worse. The original issue was about seeing Unexpected non-root current context found when extracting remote context!
in the logs. Did you confirm that that is still happening with the latest release?
Since this only happens in one environment and can't be easily reproduced locally, I'm curious if there is a firewall or proxy that could be modifying (mangling) headers or something.
@breedx-splk I modified our app to use version opentelemetry-javaagent-1.29.0-20230718.103427-8.jar
of the agent and then enabled the context leak detection -Dotel.instrumentation.experimental.span-suppression-strategy=none -Dotel.javaagent.logging=application -Dotel.javaagent.experimental.thread-propagation-debugger.enabled=true
. Immediately the app started logging
Unexpected non-root current context found when extracting remote context!
and
It contains this span: SdkSpan{traceId=481c471b077a7c43ebb85d5c9fe55921, spanId=0b0672e7505bbc92, parentSpanContext=ImmutableSpanContext{traceId=481c471b077a7c43ebb85d5c9fe55921, spanId=4eba6408cfd2c65c, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, name=s3.GetObject, kind=CLIENT, attributes=AttributesMap{data={rpc.service=s3, rpc.method=GetObject, aws.bucket.name=<redacted>, rpc.system=aws-api, thread.name=dw-77 - POST /helix/switch_plan_executions/v1, thread.id=77}, capacity=128, totalAddedValues=6}, status=ImmutableStatusData{statusCode=UNSET, description=}, totalRecordedEvents=0, totalRecordedLinks=0, startEpochNanos=1689714708411041041, endEpochNanos=0}
Describe the bug
This is a follow-on to https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/8415.
After using the 1.27 SNAPSHOT recommended in that ticket, I got some Jetty spans back, but nowhere near the expected number. I re-enabled the
ContextPropagationDebug
logging and saw a ton ofIn the "It contains this span:" message, the span information appears to always be for the name s3.GetObject. Our code never calls GetObject directly. We're using the S3TransferManager.
Also I'm not seeing any "a context leak was detected" messages.
Steps to reproduce
Unknown. Just running our application causes this to happen. This only happens for us on production, and creating a minimum reproduction is not really feasible. I know that doesn't help and I'm sorry for that.
What did you expect to see? No "unexpected non-root..." messages and no missing Jetty spans.
What did you see instead? See above.
What version are you using? 1.27.0 SNAPSHOT from around 2023-05-24
Environment Compiler: OpenJDK 11 (from Circle CI docker image cimg/openjdk:11.0) OS: Ubuntu 22.04 Runtime (if different from JDK above): GraalVM 22.3.1/JDK 17 OS (if different from OS compiled on): Ubunut 22.10
Additional context None