Open lmolkova opened 3 years ago
re-instrument, i.e. create a new span and inject header (replace or add another value)? Then the logic that injected the header (and created a previous span) will be broken. There is no way to suppress that span.
Why does this break the other span? I would assume that the newly created span will be the child of the one that first tried the injection, and so replacing the outgoing headers seems safe to me. Am I missing something?
Good point. It will be child only if the previous span was current, but there is no requirement to make it current and there is no reliable mechanism for context propagation that would work everywhere.
Also, it does not seem beneficial: it seems that if something has added headers, re-doing it is likely redundant.
Another issue is increased costs, users pay for telemetry either to vendors or through operational costs. If we agree this case is legit, we're talking about a significant increase in cost.
FYI, we (Splunk) have customers who are demanding the multiple layers like this all be represented, so there probably needs to be configurability around how this is handled.
@jkwatson Do multiple layers inject headers? If not - there could be any number of layers, just the one that injects headers wins. Also, behavior is currently language-specific (e.g. .NET instrumentation backs-off, Java agent replaces header, not sure about others), so does it work now?
@jkwatson Do multiple layers inject headers? If not - there could be any number of layers, just the one that injects headers wins. Also, behavior is currently language-specific (e.g. .NET instrumentation backs-off, Java agent replaces header, not sure about others), so does it work now?
There's less of an issue right now with double-injecting headers (the "Setter" spec does imply that you overwrite the headers, so that's really not an issue, I don't think). The issue is really that the javaagent has a policy of enforcing only a single "CLIENT" span, which means that multiple layers are being suppressed by the javaagent instrumentation currently. However, there is now an open discussion to deal with this and optionally allow nested CLIENT spans to proceed: https://github.com/open-telemetry/opentelemetry-java-instrumentation/discussions/3318
It would be good to have this be a part of the spec, though, and not just something pushed because one vendor has customers that want it. :)
I came across this very issue today in working on tracing in Quarkus.
I had the Vert.x Tracing SPI creating a HTTP CLIENT span after a JAX-RS ClientRequestFilter had already created an HTTP CLIENT span. It meant I had one CLIENT span with a child of the SERVER span it called, and another CLIENT span completely separate.
I ended up checking for traceparent
in the outgoing request headers inside the Vert.x Tracing SPI and didn't create a new span if it was set.
Would be good to have a defined approach to how to know a duplicate span would be created if a piece of code created a new span.
More on this in #1811: backing off is not a great strategy because of potential reuse of http request instances https://github.com/open-telemetry/opentelemetry-specification/issues/1811#issuecomment-878692618.
Additional context. It not very common to reuse HTTP requests between retries, but e.g. golang http client allows that and there are likey other clients/cases where it happens.
[just as a tangent: I was surprised to find recently in https://github.com/[open-telemetry/opentelemetry-java-instrumentation/pull/2727](https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/2727) that reusing HTTP requests is supported by most of the 20+ Java http client libraries that we instrument]
I agree that we need a way to identify if we are in a nested CLIENT span, for a couple of reasons:
- to avoid overwriting
traceparent
from the outer CLIENT span- to conditionally suppress nested CLIENT spans (based on some kind of user verbosity preference)
It sounds like there are at least a couple of options to accomplish this:
- use the presence of
traceparent
to know that we are in a nested CLIENT span (this requires clearingtraceparent
at the end of each request if http request reuse is supported by the instrumented library)- use some kind of marker in the Context to know that we are in a nested CLIENT span (this requires setting the marker in the context and propagating it to any downstream libraries that may also be instrumented)
Maybe we can focus initially on
- Getting agreement that we really do need a way to identify if we are in a nested CLIENT span
- Listing options to accomplish that
We need simple way to prevent multiple levels of HTTP instrumentation (duplicates from client libraries/users AND auto-instrumentation). Initial discussion on Java instrumentation SIG 2021/7/15 (UTC) https://docs.google.com/document/d/1WK9h4p55p8ZjPkxO75-ApI9m0wfea6ENZmMoFRvXSCw/edit
What are you trying to achieve?
I'm working on Azure SDK instrumentation and we have a problem with double instrumentation on the HTTP layer. Here's the context:
So users who don't use auto-instrumentation have spans representing HTTP calls from our client libraries, users who use auto-instrumentation, would get two spans for the same HTTP call.
Generalizing this problem beyond Azure SDKs:
TL;DR: - Client libraries cannot trace common protocol-level calls if those could also be traced by auto-instrumentation. - If they do (when auto-instrumentation is on), duplicate spans will be created and context propagation will be broken. - Libraries could make their programmatic tracing configurable, but then they cannot inject service-specific data into auto-instrumented RPC-call span
Potential solutions
This approach also means that the user's manual instrumentation always wins, which seems like a good default to have in terms of supportability. This is really short-term mitigation for a subset of double-instrumentation problems.
A similar contract might exist for server-side auto-instrumentation: if there is a current span already - back-off, but I can imagine it can go sideways if requests start from a thread that has span (e.g. created in start-up) by mistake.
Existing discussions
1747
530
1653
Suppressing instrumentation implementations in OTel
Please note that context suppression is not really possible in client libraries as it requires dependency on instrumentation packages (to export suppress key). So this is not a viable workaround.
Happy to hear suggestions and drive it.