Open hibachrach opened 5 months ago
Something similar is already being done by instrumentation libraries.
For example, the WithPublicEndpoint
option in Go:
https://github.com/open-telemetry/opentelemetry-go-contrib/blob/a91e60beace7c1d96302fe359050d705c4723e72/instrumentation/net/http/otelhttp/handler.go#L140-L146
Ruby has this proposal open for the Rack instrumentation: https://github.com/open-telemetry/opentelemetry-ruby-contrib/pull/837
It seems your pseudo-code proposal would be the same as:
link = OpenTelemetry::Trace::Link.new(OpenTelemetry::Trace.current_span.context)
span = tracer.start_root_span(
"FileConversion",
with_parent: OpenTelemetry::Context.empty,
links: [link]
)
It's a bit more verbose, but nothing a simple helper method can't resolve. Does it really need a spec change?
Hmm, perhaps I'm misunderstanding, but I don't think your code snippet isn't equivalent to what I'm describing. Your code allows one to sever the parent-child relationship between Spans that is present by default, but it doesn't allow for the opposite (converting what would be a Link by default into a parent-child relationship).
cc @open-telemetry/semconv-messaging-approvers
@hibachrach If I understand correctly, you enqueue messages (or background jobs), and at enqueue time you want to specify whether the span for dequeuing the message (or running the background job) is created as a child or a link of the current span.
This effectively requires two parts:
tracestate
header.This is definitely a valid use case, however, I'm wondering whether the need is widespread enough to merit a specification change and implementations in all languages. This issue is a great place to collect use cases and gauge interest.
Currently and without this capability, this needs to be solved on instrumentation side. In your case, the (sidekiq) instrumentation library could provide a configuration option to specify for which jobs you want to have parent/child relationships and for which you want to have links. I know of some instrumentation libraries (for example AWS Lambda for .NET) where such an option exists on a global level and allows switching between parent/child and link relationships.
Both static and context-based configuration for instrumentation are problematic
Context.current.setPropagationStyleHintTo('Child')
which would be affected whether application author wants it or notThis sounds like a visualization concern rather than instrumentation/data collectopm issue.
@hibachrach could you please elaborate on
From a user's perspective, only the solid-line nodes in the above graph are important. The dashed lines are merely an implementation detail. Similarly, only the Spans related to the solid-line nodes are, to me as the engineer monitoring this system, significant. If the analytics-related message is not complete for another hour, that does not matter to me very much.
Could you please help me understand why is it important to use links (and not parent-child) for some of the background jobs:
@pyohannes
Currently and without this capability, this needs to be solved on instrumentation side. In your case, the (sidekiq) instrumentation library could provide a configuration option to specify for which jobs you want to have parent/child relationships and for which you want to have links
Yes, this was my first thought as well--see https://github.com/open-telemetry/opentelemetry-ruby-contrib/issues/991
@lmolkova
Both static and context-based configuration for instrumentation are problematic
- Static configuration is not enough since some jobs are parented, others are linked.
- Context-based - there could be other spans nested under
Context.current.setPropagationStyleHintTo('Child')
which would be affected whether application author wants it or not
So context based solution would not reset it for the whole subtree, but merely for the immediate child. At the discretion of the instrumentation library author for that child node, it could continue propagating. Otherwise, it would reset.
This sounds like a visualization concern rather than instrumentation/data collectopm issue.
[...]
Could you please help me understand why is it important to use links (and not parent-child) for some of the background jobs:
- is it a visual noise problem?
- is it related to billing?
- does it skew some additional analysis like e2e latency?
- anything else?
I would say it's all of those items you listed:
So context based solution would not reset it for the whole subtree, but merely for the immediate child. At the discretion of the instrumentation library author for that child node, it could continue propagating. Otherwise, it would reset.
What if you want the setting to keep propagating, but the instrumentation author decided otherwise?
Yes, this was my first thought as well--see https://github.com/open-telemetry/opentelemetry-ruby-contrib/issues/991
This issue is very relevant, and maybe it does entitle a specification change. But does it require an API change? A specification change could be something that asks (official) instrumentation authors that their libraries may provide an option to propagate the parent span as a link rather than the default parent/child relationship.
cc @arielvalentin
thanks for the details @hibachrach
I do agree that all of these problems are good ones to solve, but I wonder if the proposed solution actually solves them in general case.
First, application need to customize parent/links relationships for specific spans in the system. It would improve the situation, in some case, but
I'd suggest to tackle each of the problems individually and in a generic way (ask backends to improve links support/visualization, improve sampling with links, come up with ways to identify how to measure e2e latency, etc).
One more thing:
In many cases having links is not a choice, but a necessity - e.g. when pulling message from a queue, you don't know the context of the message(s) you're about to pull and can't use parent-child relationship. For the systems that don't have this problem, e.g. in-memory queues, parent-child may be a better default.
@hibachrach is this still something you are interested in? please take a look at @lmolkova's comment
I'm interested in the general problem but not married to any particular solution--if it would help to close this issue to focus on different manners of addressing it, feel free to do so.
Thanks for your checking in 😄
First time contributor here--please let me know if there's anything I need to correct procedurally!
What are you trying to achieve?
I'm hoping to extend the API description for Context as it pertains to Traces (or perhaps change the API description for Traces?)
Here's the problem I'm trying to solve:
Consider a user flow in which a file conversion service accepts an HTTP request. The user's intent is to upload a file to convert it from one file type to another (e.g.
.pdf
->.png
). For infrastructure performance reasons, this work is enqueued in a messaging system/background job processing system (e.g. Sidekiq in the Ruby ecosystem). As part of handling the same request, a similar message/background job is sent to the same system. This flow is depicted below:From a user's perspective, only the solid-line nodes in the above graph are important. The dashed lines are merely an implementation detail. Similarly, only the Spans related to the solid-line nodes are, to me as the engineer monitoring this system, significant. If the analytics-related message is not complete for another hour, that does not matter to me very much.
This maps neatly to the notion of child Spans & Links: for a step in a flow where it is all really part of one high-level request, one would hope to have that represented as a child Span. It is truly just part of the overall Trace (or, at least, what I would expect a Trace to capture). On the other hand, the analytics message is relatively detached. While a Link could point me to its source, it is not something that is part of the user journey.
What did you expect to see?
Ideally, this is something that you'd be able to hint at via Context. Here's what the above would look like via pseudocode:
This way, the Otel instrumentation for
BackgroundJobSystem
would know how to relate the next Span it creates to the current Span. This hint would reset within further nested spans (e.g. any tasks thatFileConversionTask
may enqueue would, by default, be connected via Links.Some open questions that I don't have the answer to just yet:
PRODUCER
andCONSUMER
Spans and you'd want it to apply to both. So the instrumentation library needs to be given some flexibility in interpretation. Also, making it a requirement would necessarily be a breaking change I would think?Additional context.
I was directed here from https://github.com/open-telemetry/opentelemetry-ruby-contrib/issues/991 as a contributor thought this may make sense more generally.