open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.7k stars 886 forks source link

Clarify semantic of SpanKind regarding parent/child relationships #3172

Closed pyohannes closed 1 week ago

pyohannes commented 1 year ago

What are you trying to achieve?

The specification of SpanKind describes "logical" parent/child relationships between spans:

The first property described by SpanKind reflects whether the Span is a "logical" remote child or parent. By "logical", we mean that the span is logically a remote child or parent, from the point of view of the library that is being instrumented. Spans with a remote parent are interesting because they are sources of external load. Spans with a remote child are interesting because they reflect a non-local system dependency.

It also talks about consumer spans in particular:

CONSUMER Indicates that the span describes a child of an asynchronous PRODUCER request.

The concept of a "logical" parent/child relationships created confusion during discussions in the messaging workgroup, in particular in relation to relationships between producer and consumer spans, which are often modelled as links (e. g. in existing messaging examples regarding batch receiving or batch processing).

The specification should make clear that a "logical" parent/child relationship also applies to linked spans, and ideally use unambiguous terms for describing relationships between spans that can be either parent/child relationships or links.

blumamir commented 1 year ago

related isssue: https://github.com/open-telemetry/opentelemetry-specification/issues/526

blumamir commented 1 year ago

To me, the kind is very useful to understand if a span is incoming or outgoing to the current process. This helps create boundaries for display and provide context to end users, similar to what https://github.com/open-telemetry/oteps/pull/182 is trying to achieve. The use of parent-child-link relationships make it so that messaging spans, network spans, framework spans etc need to sometimes be marked as "internal" because they have no remote children / parent or links, but for end users this is not very helpful.

I think that the operation itself carries some "logical" direction which makes sense to me to go in the kind. So if we open a TCP socket, it might not inject or extract remote context, but it still plays the role of a CLIENT in the conversation. Similarly in messaging, we might not always have the remote context available to go into the parent or the link (redis pubsub, socket.io), but I still think it makes sense to mark spans that describe async messages entering the application as "CONSUMER".

I wonder if the wording can describe something else other than parent-child-link, which relies on the operation semantics and not the trace structure.

One edge case I can think of: for polling of messages from a remote server - "receive" span, it describes both a "client" request to a server to fetch messages and both "consumer" as it receives a batch of remote async messages to consume. In this case, the kind is not well defined.