Span structure for messaging scenarios

pyohannes commented 1 year ago

This OTEP aims at defining consistent conventions about what spans to create for messaging scenarios, and at defining how those spans relate to each other. Instrumentors should be able to rely on a consistent set of conventions, as opposed to deducing conventions from a set of examples.

This was split from OTEP open-telemetry/opentelemetry-specification#192, which became too comprehensive.

Fixes https://github.com/open-telemetry/opentelemetry-specification/issues/1085.

carlosalberto commented 1 year ago

Some minor, late pieces of feedback:

For cases of Create being created separatedly from Publish: Is Create that expensive that it may need it's own Span? Is this operation covering something other than creating a local process object?
Example of auto Settlement span?
For the single-parent-child relationship (receiving a single span): this recommendation, are we ok with this recommendation even when Publish is a batch operation? That is, many messages are produced but on the other end we only consume them if they are delivered individually. No strong opinion yet but was curious about this specific case.

joaopgrassi commented 1 year ago

@carlosalberto about your questions above:

For the single-parent-child relationship (receiving a single span): this recommendation, are we ok with this recommendation even when Publish is a batch operation? That is, many messages are produced but on the other end we only consume them if they are delivered individually. No strong opinion yet but was curious about this specific case.

We already have a diagram covering that case, and it's a valid one. I started a discussion about it on Slack, but for visibility we intend to add a small clarification to the text so it covers your suggestion/question

Before:

After

Do you think that addresses that point?

carlosalberto commented 1 year ago

@joaopgrassi Thanks, yes, it does. I only would like, in this case, a small note mentioning that this (optional) parenting results in a trace including all the messages published in the batch plus children (which may not be so obvious). So it's not distracting, maybe put it as a foot page note or so?

(Non blocking by the way - I just want to raise the attention on what may not be obvious and the effects of enabling the direct parenting, going forward).

joaopgrassi commented 1 year ago

@carlosalberto

a small note mentioning that this (optional) parenting results in a trace including all the messages published in the batch plus children (which may not be so obvious)

hum, if each message has its own create context, then it won't be the same trace. I guess your scenario would only happen if the publish span is used as the creation context for all the messages. In that case yes, then the consumer will all be under the same trace.

OR did you mean that when each message has its own create context, by looking at the trace you might also see other messages, even though you didn't process them?

Not sure how we should communicate this now 🤔

Oberon00 commented 1 year ago

A create context will usually not be a root context, hence all messages published in a batch will usually belong to the same trace (NB: This is different from bach-receive). This could even be added as a requirement

carlosalberto commented 1 year ago

A create context will usually not be a root context, hence all messages published in a batch will usually belong to the same trace

This one @joaopgrassi ;)

lmolkova commented 1 year ago

Following up on WG call for prototypes, I'm documenting parts of this OTEP implemented on Azure messaging SDKs (including conflicts or extra features):

Producers
- [✔️ Aligned with the OTEP] Azure SDKs support batching on send:
  - create span per message (if there is no context on the message yet)
  - create publish span
  - add links from publish span to the creation context
- [ ⚫ Not covered] Do not support mode when Publish and messages in a batch are represented with a single span
Consumers
- push-based (delivery)
  - [✔️] create consumer spans with links to messages
  - [✔️] add parent-child relationships for a single message delivery
  - [❗ Conflict] We call it process and not delivery in all cases (full processing handler or just a delivery callback), will need to update
- [✔️] poll-based: follow the receive operation semantics
- settlement
  - [✔️] we trace settlement that happens on the client side
  - [⚫] we have multiple kinds of settlement (abandonment, deferral, dead-lettering, completion)
Misc:
- [⚫] we have all sorts of messaging-related operations: scheduling, peeking (no settlement or locking), receiving deferred messages, getting a lease on a message which we trace trying to stay consistent with similar semantics

Oberon00 commented 1 year ago

@lmolkova Is there / will there be public documentation on this Azure support for OpenTelemetry and how to use it with non-AppInsights backends?

lmolkova commented 1 year ago

@lmolkova Is there / will there be public documentation on this Azure support for OpenTelemetry and how to use it with non-AppInsights backends?

sure, all the docs are here:

.NET
Java
- it's part of otel agent
- we also have some metrics for AMQP libs, here's how to enable, and design docs
JS/TS
Python
C++, no messaging instrumentation yet
Go is coming

We try not to add many attributes until semcov stabilizes at least a bit, but will definitely do more. Any feedback or issues are very welcome.

open-telemetry / oteps

Span structure for messaging scenarios #220