Messaging: per-message tracing when sending batches

lmolkova commented 2 years ago

In Messaging Instrumentation WG, we're looking for the proper way to trace multiple messages sent within a single batch. We do not have a concept for this in tracing spec and want to hear opinions on the options we came up with.

E.g. a user sends a batch of messages like producer.send([msg1, msg2]), then this batch is reshuffled on the broker and then each message is sent to consumer(s) as a part of another batch.

In this case, users should still be able to trace individual messages through the system. To achieve it, we need a unique context per message that's propagated from producer to consumer.

Options:

Span per message: send span has links to each message span.
- Pros: fits into a current mental model
- Cons:
  - span duration is artificial: it's either 0 (message creation) or tight to send span duration. It can potentially measure when each message is sent (but many systems get ack on batch, not per message)
  - extra span collection with corresponding perf hit and storage costs increase
  - [EDIT]: Can't change this choice later in non-breaking manner
Spanless context. Only create a new span context per message. send span has links to each message span context.
- Pros:
  - no extra costs for span creation. all necessary information will go through links
  - [EDIT]: can report span (or event with context semantics) later in the future versions of the spec if spanless context will be proven difficult to use. This won't be breaking.
- Cons:
  - new mental model. There is a context with no span ever created
  - maybe more complicated indexing for links on the backends. I.e. there will be a link from the producer and a link from the consumer to the spanless context

More context: https://docs.google.com/document/d/1OrHsepd6GjzXKll1ggZyx1jBQd0d_t8NZXT1ZOem7D0/edit#heading=h.hfmrnf56kiuf

lmolkova commented 2 years ago

/cc @joaopgrassi @blumamir @pyohannes @dpauls

Oberon00 commented 2 years ago

Have you thought about how this would be represented in the protocol? Would you then send a list of links along with the list of spans? The links could also be interpreted as "data-less spans" that have no timestamps (maybe a creation time?), and no attributes beside trace ID, span ID, parent span ID.

CC @discostu105: I think at Dynytrace we have been using a concept very similar to these spanless contexts ("links") but IIUC we are in the process of getting rid of them.

lmolkova commented 2 years ago

Have you thought about how this would be represented in the protocol?

The same way as today (in both cases) - using span.links that already have all the properties we need (linked context and attributes). Or did I misunderstand your question?

Oberon00 commented 2 years ago

So you need attributes? Then it's not a pure "context".

joaopgrassi commented 2 years ago

@lmolkova please correct me if I got it wrong, but I think the idea is that we have a "context" (no attributes/no real span) that's inside the message. Then on Send/Publish, we add a link to each message, where the link points to the "context" inside the message. The link then would hold the message-specific attributes, such as message.id or message.destination since we can't add those to the Send/Publish span.

MSNev commented 2 years ago

FYI from spec meeting this morning re an Events API support on Logs https://github.com/open-telemetry/opentelemetry-specification/pull/2676

Oberon00 commented 2 years ago

One idea that I wanted to bring up is to use zero-duration-spans after all and use a special span kind like "DOWNLINK". That way you could hide them / collapse them into the parent in an UI. Of course, this would need special support by the backend, but so would anything else we discuss here.

lmolkova commented 2 years ago

@Oberon00 agree that if we follow 0-span duration path, backends need some heuristics that tell it's a span created for message. I believe we can do it with PRODUCER kind (and publish span, in this case, is just CLIENT).

Still, such a span should have 0-duration, no status, no attributes, no links or events and it raises the question if it should be a span or something else.

With a pure context, we keep the door open to adjust to real-world feedback. We can add an event or a span later in messaging v1.X in a non-breaking manner.

Picking event with context semantics or span will be a final decision.

Oberon00 commented 2 years ago

I thought attributes were actually needed on these?

lmolkova commented 2 years ago

Attributes should be on links to message context, not on spans.

Reasons:

messages can be forwarded between brokers preserving the context (i.e. span is created on the first service, but not on the next hop)
users can put context on messages and we don't want to require users to set attributes and follow messaging semantics, we want to offload it to auto-instrumentations as much as we can.

Oberon00 commented 2 years ago

Still, such a span should have 0-duration, no status, no attributes, no links or events

Attributes should be on links to message context, not on spans.

I don't understand. If we used zero-duration spans, of course it would make sense to put the attributes on the spans, and have the publish span as parent, and not using zero-duration spans and links at the sender side at the same time.

Of course the message that is sent to the broker would only contain a pure context, completely independent of how this is implemented in OTel.

lmolkova commented 2 years ago

I don't understand. If we used zero-duration spans, of course it would make sense to put the attributes on the spans, and have the publish span as parent, and not using zero-duration spans and links at the sender side at the same time.

Imagine I create a message on service A and publish it to Kafka topic. Service B receives it and forwards it to service C via another Kafka cluster/topic. It's quite a common scenario and there are many tools that do it. You can only create message span on the service A, but where would you put message-specific attributes on service B? They have changed - it's a new topic and cluster

Or imagine I'm a user and keep source context in which blob/DB record was created in record metadata. I want to use this context as my message context and stamp it on the message manually. Auto-instrumentation that publishes this message cannot create a span and override message context. Where would you put attributes if there is no span? Asking user to create this span is not a great experience.

The answer to both cases - put them on links.

Oberon00 commented 2 years ago

You can only create message span on the service A, but where would you put message-specific attributes on service B? They have changed - it's a new topic and cluster

I might have completely misunderstood the design proposed here. I have a hunch what you might mean now, but I'm still not sure. So let me ask this: Why can't service B not create a messaging span? Since service B is not an intermediary, but a service, it ought to create a span with the incoming message as parent (or with a link to it) and modify the span context on the message with the span context of the the create/publish span of the new message publication. Am I wrong here? If service B cannot modify the context on the message, it will be impossible to tell from the trace structure if anything you link to the context on the message happened in service A or service B, and in which causal/happens-before relationship.

lmolkova commented 2 years ago

Why can't service B not create a messaging span?

Service B can create a span, but then it has to modify message context as well. Now let's assume ServiceB is a broker or, in a more popular case, an extra app layer that does geo-replication. While it could create a processing span, then create a new span for the message and modify the context, it'll be inefficient and verbose for the case of simple forwarding/routing/sharding.

Moreover, assuming ServiceB is a broker, its telemetry could belong to the cloud provider it's managed by. Creating such spans would break causality.

So the rule of thumb we came up with: if messaging library/system got a message with context (forwarded from somewhere else or set by user) - it must not create a span for message (or a new context). This context should be de-facto immutable.

Now causality without message span is achieved through links. If message context is created:

by publish call instrumentation: 1) we have a link to it 2) we know it's actually a sibling of a publish span
by the user - it's all in the user's hands if to create a message span or get context from somewhere else and take case of cuasality

In either case, we still have publish span on every hop that is linked to this context. You can follow along and see message received on ServiceB and republished there via links.

Oberon00 commented 2 years ago

Moreover, assuming ServiceB is a broker, its telemetry could belong to the cloud provider it's managed by. Creating such spans would break causality.

This multi-tenant/multi-vendor problem can & should be solved with per-tenant/per-vendor tracestate entries. I think we should keep that discussion separate. https://github.com/open-telemetry/opentelemetry-specification/issues/366#issuecomment-580235961

So the rule of thumb we came up with: if messaging library/system got a message with context (forwarded from somewhere else or set by user) - it must not create a span for message (or a new context). This context should be de-facto immutable.

To clarify: Of course the library/system would create a (publish) span, but it should not not inject that span's context into the message. Is that what you mean?

I think this is a general new propagation design that you propose here, and I don't see how this is specific to messages. You could apply the same strategy to HTTP requests, which may also pass multiple hops (e.g. consider AWS Lambda which you usually invoke via a service called API gateway proxy, or Google Cloud Functions, which are behind a load balancer that actually participates in the W3C trace, trashing your span IDs, see https://github.com/open-telemetry/opentelemetry-specification/pull/1852#discussion_r721993932)

lmolkova commented 2 years ago

This multi-tenant/multi-vendor problem can & should be solved with per-tenant/per-vendor tracestate entries.

Sure, but let's make sure we keep the routing/replication/forwarding/sharing discussion on. Service-meshes would be first to hit the problem here.

Of course the library/system would create a (publish) span, but it should not inject that span's context into the message. Is that what you mean?

Correct, in batch send, publish span context cannot be put on messages - if it does, they would not be individually traceable. I mean that message creation belongs to the application and the application can inject the context that it wants.

Auto-instrumentation should allow applications to associate a custom context with the message. If we allow this, the next immediate conclusion would be that auto-instrumentations MUST NOT override this context, therefore MUST NOT create a message span when the context is present on the message already.

I don't see how this is specific to messages.

It's specific to messages since:

you can have batch send/receive
messages spend potentially significant time in queues
message processing represent application logic, while sending this messages through multiple hops is mostly irrelevant

The key difference here that for HTTP that request content is tightly coupled to the transport call and new call requires a new message, for messaging it's not the case.

Assuming everything would have a span, forwarding A->B->C scenario would look like this:

A: message span s1
A: message span s2
A: send batch (with links to s1, s2)
B: receive batch (with links to s1, s2)
B: message span child-of-s1
B: message span child-of-s2
B: send batch (with links to child-of-s1, child-of-s2)
C: receive batch (with links to child-of-s1, child-of-s2)
...

(would you like that for every service mesh instrumentation?)

Without context modification:

A: message context s1
A: message context s2
A: send batch (with links to s1, s2)
B: receive batch (with links to s1, s2)
B: send batch (with links to s1, s2)
C: receive batch (with links to s1, s2)
...

Both of these options carry the same information, but the first one is much more verbose. So what's the benefit?

Oberon00 commented 2 years ago

message processing represent application logic, while sending this messages through multiple hops is mostly irrelevant

The same could be said about HTTP: The ultimate handler of the HTTP request contains application logic while any (reverse) proxies in-between are less relevant.

Both of these options carry the same information, but the first one is much more verbose. So what's the benefit?

They do not. In the first scenario, you have the relationship A -> B -> C, and in the second one you only have A -> B and A -> C, i.e. a direct connection from A to both B and C, and only an indirect and undirected connection between B and C over the common parent A. You have no idea whether B forwarded to C, C forwarded to B, or A sent to B and C simultaneously (though the latter would be the most direct interpretation of the trace structure). That's what I meant by the loss of causal/happens-before relationships.

I want to bring up that we seem to discuss two mostly orthogonal topics in this issue:

span-less contexts / spans with multiple contexts / downlinks
A context propagation style where subsequent participants of a request handling chain become siblings of each other (and direct children of the initial client) instead of each participant becoming a child of the previous.

lmolkova commented 2 years ago

The same could be said about HTTP: The ultimate handler of the HTTP request contains application logic while any (reverse) proxies in-between are less relevant.

Perfect observation. So brokers and forwarders are like HTTP proxies and load balancers. They probably don't emit any traces, and when they do, they probably should not change HTTP headers, otherwise traces become too verbose.

You have no idea whether B forwarded to C, C forwarded to B, or A sent to B and C simultaneously (though the latter would be the most direct interpretation of the trace structure)

It's a fair point. At the same time, the moment you introduce batching, you lose causality because links don't provide it.

A: message s1 A: message s2 A: message sN, ... A: publish links to s1, s2 A: publish links to sN, ... B: receive (new trace) links to s1,sN

By looking at this the only way to tell that A called B is by timestamps. The only way to achieve causality is to force users to create child spans (per message) on consumers. But auto-instrumentations can't guarantee it. And some scenarios (e.g. I aggregate data from batch) don't separate messages at all.

I want to bring up that we seem to discuss two mostly orthogonal topics in this issue:

span-less contexts / spans with multiple contexts / downlink

A context propagation style where subsequent participants of a request handling chain become siblings of each other (and direct children of the initial client) instead of each participant becoming a child of the previous.

Agreed, but they are related to some extent.

To your second point - there are no siblings - they are all independent traces related via links. And please, don't discard the hard requirement: if a user provided a context in the message, auto-instrumentation or broker cannot override it. Please notice that it means infra pieces that carry this instance of message over to the consumer cannot create message spans.

Oberon00 commented 2 years ago

OK, so you are saying, in your first scenario, B not only does not modify the context, it also does not emit any telemetry items at all? If that's the case, I misunderstood that.

To your second point - there are no siblings - they are all independent traces related via links.

But if there is a trace, there has to be a span. So now I'm a bit confused what is actually meant here.

lmolkova commented 2 years ago

But if there is a trace, there has to be a span.

I don't think this is a precise statement.

There is a span, but it's a transport span that sends (a batch) to broker or receives a batch from broker. When we receive a batch, we can't always create span per message in auto-inst, it's app responsibility to create it if they want. We can only guarantee a receive span that links to each context in a batch. Assuming you carry immutable messages over through multiple hops, there is no point in creating spans for each message, you just create links to them.

I.e. messages belong to application, application properties on the message are immutable for brokers and infra, they must not be modified. I.e. message trace context cannot be modified and no span must be created to re-trace this instance of message. New spans are created to trace the transport of this message and they have links to the context on the messages.

spanglerco commented 1 year ago

To share another use case related to this discussion, we have a service that produces Kafka messages in a transaction as a large batch to a single topic. But that batch could be thousands of messages, so adding a link per message to a single span is not feasible, as links are normally limited to 128 if I understand correctly. Similarly, the array tag approach would result in a very large tag value. I wonder if the conventions could also provide semantics for a single span representing a batch of produced messages at a cost of losing granularity in the trace. Or maybe that's already addressed somewhere and I missed it.

pyohannes commented 1 year ago

@lmolkova Do you see this resolved with the integration of https://github.com/open-telemetry/semantic-conventions/pull/284?

We introduced a requirement for attributes and links and we went with option 1 from your initial proposal (one span per message. where possible).

lmolkova commented 1 year ago

@lmolkova Do you see this resolved with the integration of open-telemetry/semantic-conventions#284?

We introduced a requirement for attributes and links and we went with option 1 from your initial proposal (one span per message. where possible).

yes, closing this one.

PS: I still like spanless contexts more 🙃

lmolkova commented 4 months ago

Reopening based on the feedback from @tedsuo to discuss zero-duration spans. Will bring it up on messaging SIG 6/27

lmolkova commented 4 months ago

capturing some feedback points:

zero-duration spans smell - they probably should be events
creation of a message (or injection of a unique context) is not a work to be reported as a span
link to context is valid (even if there is no span)
links that point to nothing should have a timestamp and a name

lmolkova commented 4 months ago

Discussed at messaging SIG:

spanless context is still controversial: there is a worry that it'd break backends
we're losing more than timestamp and name - we're loosing causality - the link does not record the parent of the context. e.g.
- Incoming HTTP request had traceId1 spanId1
- Message created in scope of it had traceId1, spanId2
- if we recorded a span, we'd record that parent of message is spanId1
- without the span we have no means to record that spanId2 is a child of spanId1
are there other similar cases where we need the new context but not a new span?

We should look for more options:

Inject parent context into messages.
- Cons: can't distinguish messages that were created in the same context.
Legalize (explain) zero-duration spans. Span is more than duration/status. It's new context, causality, name, timestamp, semantics.
Evolve event/link combination.
- Pros: event (with parent context) that has a secondary context (new message) describes a single point in time when the message was created.
- Cons: this is effectively a new signal (links expressed as events)

Will bring it up on spec meeting.

trask commented 4 months ago

Evolve event/link combination. Pros: event (with parent context) that has a secondary context (new message) describes a single point in time when the message was created.

I'm probably missing something here, how do you find the parent of the "new message" context?

lmolkova commented 4 months ago

I'm probably missing something here, how do you find the parent of the "new message" context?

I think event is an unfortunate term - we don't want to build it on top of span events, but it's not a log (no payload).

This thing is a link detached from the span. It has

parent context
it's own unique span id
name
timestamp
attributes

I.e. from the data structure it's a lightweight span without status, duration, links, or events.

joaopgrassi commented 2 months ago

@lmolkova given we have the conventions now mentioning the create context and IIRC "zero duration" spans are not a big deal, do we have anything left to do in this issue? It seems to me all is "resolved" now? Or am I missing something?

lmolkova commented 1 month ago

yeah, I think we can close it - we have https://github.com/open-telemetry/semantic-conventions/issues/1273 to track remaining work (making per-message tracing disableable). Thanks!

open-telemetry / semantic-conventions

Messaging: per-message tracing when sending batches #1187