Open lmolkova opened 3 weeks ago
I think we have two options:
CONSUMER
span kindWe'd need to add more wiggle room in already vague span kind definition to make this more legit.
By using CONSUMER
we create ambiguity: the receive span does not describe external request, its latency does not represent processing duration, errors don't represent processing errors. But any tool that makes generic assumptions based on the span kind alone will think that it describes message consumption.
CLIENT
kindPossible drawbacks:
CONSUMER
or SERVER
spans. i.e. service maps will not detect any incoming calls to the service. This could happen in other cases (when there is no server instrumentation), so tracing systems should be prepared for it. CONSUMER
span matching PRODUCER
spans - that's also does not seem like a trace visualization/analysis problemWe can try to address any possible drawbacks with additional semantics:
messaging.operation.type = receive
attribute, so messaging-aware visualizations/queries should be able to special-case itMy proposal is to do Option 2.
Applications that only report receive
spans have poor observability - they need to instrument message consumption anyway.
We're trying to cover it up by reporting CONSUMER
span, but it does not solve the bigger problem.
We discussed this in the meeting on 30-08-2024 and reached the consensus to use CLIENT
for the receive span and keep CONSUMER
for when process spans are created.
Given changes in https://github.com/open-telemetry/opentelemetry-specification/pull/4178, this makes sense.
With those changes, we don't see the consumer span as the end point of an asynchronous communication channel (from the point of view of application code), but as "processing of an operation initiated by a producer".
This brings some limitations, but reduces ambiguity.
Receive spans describe pulling messages from a topic/queue.
E.g. AWS SQS example looks like
Kafka example
This operation fits into a vague
CLIENT
span definition - it's a logical client call to the remote service. It's initiated by the application itself, ends once the corresponding method return received messages and does not account for any message handling or processing time.But we currently specify that
receive
spans should beCONSUMER
- https://github.com/open-telemetry/semantic-conventions/blob/3c16c802e8ae8849ae0cf31eac02c3cabf64e4dd/docs/messaging/messaging-spans.md?plain=1#L213Why it's
CONSUMER
?The
receive
operation is the only messaging span that instrumentation libraries can guarantee to be created on the consumer side when messages are pulled.If there is a higher level framework that is used to process messages (such as Spring or Apache Camel) it may create processing
SERVER
spans, otherwise they may be created by user applications.The
CONSUMER
kind on thereceive
spansSee https://github.com/open-telemetry/oteps/blob/main/text/trace/0220-messaging-semantic-conventions-span-structure.md#span-kind for the context