Should zero-duration spans be considered good/reasonable practice?

lmolkova commented 2 months ago

Zero-duration spans are used by messaging semantic conventions when publishing messages in batches https://github.com/open-telemetry/semantic-conventions/issues/1187

Messages need to have unique context to be traceable individually and the only way to achieve it today is by creating a span for each message and injecting corresponding context into the message.

There could be other cases like local/fire-and-forget/no-ack cases where context-propagation is necessary:

in-memory queues where each work item should have a unique context
gRPC streaming where individual messages need to have trace-context propagated
general-purpose eventing (e.g. Akka)

Based on discussion with @tedsuo, zero-duration spans are controversial.

I'd like to discuss alternatives and/or settle on the zero-duration spans as legit solution.

What do we need to capture for a message/event/work-item:

timestamp
unique context (span-id)
parent context
name (for visualization/identification purposes)
attributes (in some cases)

Alternatives to zero-duration-span:

Event
- ❌ does not have unique context
- span events are (hopefully) going away, logs are not part of tracing signal
Pure span-context + link (from publish, as discussed in https://github.com/open-telemetry/semantic-conventions/issues/1187 )
- ❌ does not have means to capture parent context (e.g. incoming request)
- needs publish span, which is not there for in-memory/streaming operations
- minor: links would need to have name and timestamp
Something new:
- part of tracing (exported as (or along with) a span)
- has all properties outlined above
- does not have duration/status

lmolkova commented 2 months ago

Effectively, the only alternative to zero-duration spans that I can come up with is a new 'thing' (Option 3) that's part of tracing and is very similar to span, but does not have a duration or status.

Note: The boundary between fully-local and network operation is still blurry - e.g. sending gRPC stream message or making UDP call still involves network, just no ack. The duration could be nano or micro-seconds and operation may still fail if connection is lost.

So my preference would be to stick to spans as a general-purpose solution for this problem.

jack-berg commented 2 months ago

I think calling the spans representing "Msg 1" and "Msg 2" zero-duration is underselling what's happening with them. These spans represent queueing a message to a local buffer for publishing. This action is probably approaching zero-duration, but is not actually zero. Therefore, I'd say that any disagreement on the basis of zero-duration is invalid because these actions do in fact have a duration.

The alternatives add additional complexity to the data model and do not appear to meaningfully improve the modeling of these types of scenarios.

I'm in favor of using spans to solve this problem.

pyohannes commented 2 months ago

This action is probably approaching zero-duration, but is not actually zero. Therefore, I'd say that any disagreement on the basis of zero-duration is invalid because these actions do in fact have a duration.

@jack-berg is right that the spans won't actually have a zero duration, but in many cases a very short duration. Maybe the discussion should be re-framed and not be about zero duration, but about a useful duration?

In many cases in question, the (nano-seconds) duration of such spans will not be of interest, and the only reason the spans are created is to obtain a distinct context. The span is more or less a side-effect. The question should be whether we're fine with that or not.

lmolkova commented 2 months ago

Adding more point raised in the spec discussion on 7/2:

we want something that can have children - this indicates it should be a span
we want it to be visualized similarly to spans
there were no objections proceeding with such spans in messaging semconv from the spec community

/cc @tedsuo in case you want to add any additional feedback on this

lmolkova commented 1 month ago

let's continue the discussion then.

SIGs regularly propose ideas that violate core design principles of OpenTelemetry

Since there was a consensus during the call and within messaging SIG, I feel we have different understanding of the principles.

Regardless of the above:

messaging SIG decided to give flexibility to create per-message spans for batch publish.
the alternative (create events with context) would have the same overhead.

tedsuo commented 1 month ago

I would like to understand if the following pseudo code models the situation:

foreach (messages as message){
  startSpan()
  injectContext(message)
  process(message)
  endSpan()
}

If that's the case, then this appears to be a very normal situation, except for the fact that the span durations are presumably very short, and we have little interest in them. Normally we'd avoid such tiny spans, but they are necessary in this case for building the graph correctly.

"Zero duration spans" are usually brought up as a way to model something that isn't really a span. But if the proposal is to create ordinary spans that are just very tiny, that seems reasonable. I would suggest that we avoid referring to them as "zero duration" to avoid confusion.

If we're talking about a different situation, do you mind providing some simple pseudo code to describe an example?

I do still have concerns about the overhead of generating so many spans. I agree that all other approaches proposed would have similar overhead... is this a problem? I haven't tested it, I really don't know the answer. If the overhead is unacceptable, then it seems like we have a real modeling challenge in front of us. If the overhead is fine, maybe this is just ordinary instrumentation, with nothing special about it? That would be a relief. :)

jack-berg commented 1 month ago

Wanted to reiterate a comment I made in the 7/16/24 spec SIG:

The defining characteristics of the trace / span data model are: 1. spans have a start & end time (i.e. duration). 2. spans are arranged into a hierarchy where each hierarchy is a trace, and these hierarchies can be linked together into a larger graph through links.

When considering whether we should model something with a span, we should reject it if the operation being modeled has no useful duration AND does not contribute anything meaningful to the trace graph. Put the other way, if an operation either has a meaningful duration OR adds meaningful value to the trace graph, then its valuable to model as a span.

tedsuo commented 1 month ago

Thank Jack, it's a good definition and I completely agree.

In the 7/16/24 meeting, we reviewed the model for batch processing spans. These spans do represent meaningful operations in a trace graph, according to Jack's definition above. So there is nothing incorrect about using spans. In fact, this is a very normal use of spans, except for the fact that so many need to be created in order to correctly model the trace.

So really, the only concern is efficiency. And concerns about this were raised. Generating timestamps, random IDs, and other data structures have overhead. Making large numbers of individual API calls have overhead.

My request would be for the Messaging SIG to determine how bad this overhead is, and whether some kind of bulk API for span creation and injection could possibly lower this overhead in a meaningful way. If measuring the duration of these spans creates significant overhead, I would be in favor of marking them as zero duration as part of bulk span creation.

lmolkova commented 1 month ago

From the messaging SIG, we want to provide flexibility and allow to trace individual messages independently. Otherwise tracing may become not quite useful.

We can ask implementations to make such behavior configurable, but we want to allow creating spans per message.

As an owner of messaging SDK that does batching in multiple ways, I have to provide per-message tracing for my users. We've had it implemented for several years. The learnings:

there are bunch of use-cases where per-message tracing is important. If OTel prohibits it or makes it opt-in even for batches, I'd not follow the spec.
it's confusing to some users and we want them to disable per-message tracing if they need to
perf impact depends A LOT on other things - message rate, batch size, message sizes, etc - there is no single number I can give you that would make sense without detailed explanation. In short: it's negligeable comparing to the impact of enabling tracing overall.

tedsuo commented 1 month ago

@jmacd made a point that we could use sampling. There is essentially the same problem in tracing any high volume system – a high volume of concurrent HTTP requests requires sampling to address overhead. So maybe, once again, batch operations are a normal tracing situation with a normal solution.

lmolkova commented 1 month ago

The sampling applies to message spans in the same way as to any other spans, so yes, it will sample most of them out.

Sampling could in theory be used to suppress message span creation, but now that we have a better suppression mechanism (tracer.IsEnabled() - https://github.com/open-telemetry/opentelemetry-specification/pull/4063) we could let users suppress per-message traces in this way by giving per-message tracer a different name.

So maybe, once again, batch operations are a normal tracing situation with a normal solution.

Please let me know if per-message tracing with above explanation on sampling/suppression/other means to disable it could be considered as a "normal solution" or you're looking for something different.

tedsuo commented 1 month ago

Creating a tracer per message feels extreme? Also, is there a way to create "disabled" tracers? I thought this was only for the case where the SDK is not installed.

span.IsRecording() could be used to suppress the injection operation when the span has not been sampled.

lmolkova commented 1 month ago

We can create different tracers for different types of telemetry:

for all publishing calls
another for all messages

With https://github.com/open-telemetry/opentelemetry-specification/issues/3867 and https://github.com/open-telemetry/opentelemetry-specification/pull/4063 users can decide to turn tracer off.

It's configured like this

OpenTelemetrySdk.builder()
    .setTracerProvider(
        SdkTracerProvider.builder()
            .addTracerConfiguratorCondition(nameEquals("per-message-tracer"), TracerConfig.disabled()) // disable tracer by name
            .build())
.....

See https://github.com/open-telemetry/opentelemetry-java/pull/6375 for the details.

There is a great summary in #3867 description on why suppression could be a better choice than sampling in general case.

tedsuo commented 1 month ago

Ok I see what you mean. A "per-message" tracer to disable the entire scope, not a new tracer for every message 😅. That makes a lot of sense!

lmolkova commented 1 month ago

I've created https://github.com/open-telemetry/semantic-conventions/issues/1273 (blocker for messaging stability) to encourage instrumentations to allow disabling per-message tracing for batch publish.

I'm going to close this issue, please reopen if necessary.

open-telemetry / opentelemetry-specification

Should zero-duration spans be considered good/reasonable practice? #4123