open-telemetry / opentelemetry-java-instrumentation

OpenTelemetry auto-instrumentation and instrumentation libraries for Java
https://opentelemetry.io
Apache License 2.0
1.96k stars 860 forks source link

"non-root current context found" for S3.GetObject #8615

Open michaeljohnalbers opened 1 year ago

michaeljohnalbers commented 1 year ago

Describe the bug

This is a follow-on to https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/8415.

After using the 1.27 SNAPSHOT recommended in that ticket, I got some Jetty spans back, but nowhere near the expected number. I re-enabled the ContextPropagationDebug logging and saw a ton of

Unexpected non-root current context found when extracting remote context!

In the "It contains this span:" message, the span information appears to always be for the name s3.GetObject. Our code never calls GetObject directly. We're using the S3TransferManager.

Also I'm not seeing any "a context leak was detected" messages.

Steps to reproduce

Unknown. Just running our application causes this to happen. This only happens for us on production, and creating a minimum reproduction is not really feasible. I know that doesn't help and I'm sorry for that.

What did you expect to see? No "unexpected non-root..." messages and no missing Jetty spans.

What did you see instead? See above.

What version are you using? 1.27.0 SNAPSHOT from around 2023-05-24

Environment Compiler: OpenJDK 11 (from Circle CI docker image cimg/openjdk:11.0) OS: Ubuntu 22.04 Runtime (if different from JDK above): GraalVM 22.3.1/JDK 17 OS (if different from OS compiled on): Ubunut 22.10

Additional context None

mateuszrzeszutek commented 1 year ago

Hey @michaeljohnalbers ,

Can you check whether this issue was solved by https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/8815? You could try running your app with the latest javaagent snapshot version

michaeljohnalbers commented 1 year ago

@mateuszrzeszutek It doesn't look like the change has helped. In fact, things appear to have gotten a little worse with the snapshot version. See the attached screenshot. image The new snapshot was deployed to our production environment between 9:30 and 9:45 on the time scale. The counts shouldn't be lower than before the deployment.

I used the 1.28.0-20230628.113740-37 snapshot version

breedx-splk commented 1 year ago

Unless you strictly control the input volume to your service, I don't think that a single graph here suggests that the problem remains or got worse. The original issue was about seeing Unexpected non-root current context found when extracting remote context! in the logs. Did you confirm that that is still happening with the latest release?

Since this only happens in one environment and can't be easily reproduced locally, I'm curious if there is a firewall or proxy that could be modifying (mangling) headers or something.

michaeljohnalbers commented 1 year ago

@breedx-splk I modified our app to use version opentelemetry-javaagent-1.29.0-20230718.103427-8.jar of the agent and then enabled the context leak detection -Dotel.instrumentation.experimental.span-suppression-strategy=none -Dotel.javaagent.logging=application -Dotel.javaagent.experimental.thread-propagation-debugger.enabled=true. Immediately the app started logging

Unexpected non-root current context found when extracting remote context!

and

It contains this span: SdkSpan{traceId=481c471b077a7c43ebb85d5c9fe55921, spanId=0b0672e7505bbc92, parentSpanContext=ImmutableSpanContext{traceId=481c471b077a7c43ebb85d5c9fe55921, spanId=4eba6408cfd2c65c, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, name=s3.GetObject, kind=CLIENT, attributes=AttributesMap{data={rpc.service=s3, rpc.method=GetObject, aws.bucket.name=<redacted>, rpc.system=aws-api, thread.name=dw-77 - POST /helix/switch_plan_executions/v1, thread.id=77}, capacity=128, totalAddedValues=6}, status=ImmutableStatusData{statusCode=UNSET, description=}, totalRecordedEvents=0, totalRecordedLinks=0, startEpochNanos=1689714708411041041, endEpochNanos=0}