tracing: make it possible to use remote IDs as parent span ID

mladedav commented 2 months ago

Motivation

Currrently, only locally tracked spans can be used as a parent span. However, in distributed systems, the parent is actually often a span originating in another service and this relationship is desirable to be tracked.

For example, a server handling a request may have received a header specifying an ID of a remote span of the client making this call and some information from this should be saved and be available for subscribers to use.

The main example of this would be OpenTelemetry, which propagates globally unique trace and span IDs by which different spans from different systems are grouped into a single trace.

Currently, tracing-opentelemetry exposes a method that allows a span to to add the information about the remote parent after the span is already created. This is suboptimal because the span may be entered before this information is provided and it can be even changed later on. It would be instead better to force the user to provide the information during span creation time and not afterwards.

Solution

Parent ID was tracked as Option<span::Id> which was changed to a custom enum with three variants, one equivalent to the former Option::None for root spans, one equivalent to Option::Some(id) for locally tracked spans which can be looked up in the same Collect that tracks the child span (if it implements LookupSpan) and a new variant for remote spans.

This variant wraps a trait object that can be then downcasted by Subscribe implementations. This is because tracing should not force users to use any representation for their remote spans.

One possibly major change this creates is that Span::child_of now takes an argument of T: Into<ParentId>, where it used to take T: Into<Option<Id>>. All implementations that were provided before were added, but this may break some users which may have implemented Into<Option<Id>> for their own types.

TODO

These things should be done before merging but can wait until I get a greenlight on the current implementation.

no_std support -- Since the remote ID variant uses Box, this only works with alloc. I will add conditional compilation as needed. tests -- all current tests work but of course they don't test for remote contexts at all. docs -- new public API is documented but I will have to look through all current documentation for mentions of Option<Id> and possibly add some explanation in the root of the crate and other places.

hawkw commented 2 months ago

My preference, in general, is not to add things related to external distributed tracing systems to the core tracing data model. This feels like it complicates things unnecessarily when just logging in-process spans --- subscribers will now always need to handle the case that a parent span ID might not refer to a local span, even if distributed tracing is not being used.

And, I think it's preferable for a span with a remote parent to also be able to have a separate in-process parent. For instance, suppose we have a server accept loop with a span, and inside of that span, we create a child span for each request, potentially setting a remote parent ID on those requests based on headers. We would like that span to be able to be a child of the server's accept loop, as well as recording that it's a child of the remote span in the distributed tracing system.

As an alternative, what do you think of an approach where a remote parent span ID is recorded as a field when the span is created? This avoids the issue of attaching a remote parent to a span later than when it's constructed, but doesn't add a special case for remote span IDs in the core tracing data model. I'd be open to adding new field types to better support this --- it might be good to support &[u8] byte arrays as fields, so that distributed trace IDs can be recorded without formatting them as strings. I'd happily merge a PR adding byte slices as a primitive field type, which I think would benefit this use case and many others.

mladedav commented 2 months ago

Sure, that also works. I tried to keep an invariant that each span has just one parent either local or remote, but not multiple ones.

By fields you mean something like info_span!("name", ?foo, %bar, remote.id = "...") where the Subscribe implementations would just look for the expected field names, right? I'd rather avoid magic fields if I can help it but yeah.

I guess the end goal would be to use valuable for this. But until then I guess I'll look if I can make the byte slice work (although for the specific usecase of OpenTelemetry I guess &str would be enough when the context is propagated through HTTP headers and when one already has the decoded IDs, using a pair of u128 and u64 would be better).

hawkw commented 2 months ago

By fields you mean something like info_span!("name", ?foo, %bar, remote.id = "...") where the Subscribe implementations would just look for the expected field names, right? I'd rather avoid magic fields if I can help it but yeah.

In general, we've tried to adhere to the philosophy that data specific to a particular external observability system, such as OpenTelemetry trace and span IDs, or data related to metric collection from tracing spans and events, should be recorded as "magic fields", rather than represented directly in the core tracing structures. The goal is that information that's only used by a particular subscriber implementation shouldn't be part of tracing-core and should instead be communicated by a field or set of fields that the subscriber implementation documents.

I get that it feels a bit...stringly-typed, to use magic field names, but it has the advantage that it helps us keep the core libraries as general-purpose and minimal as possible, which I think is valuable.

mladedav commented 2 months ago

Sure, I get it. Thanks for the explanation, I'll keep that in mind going forward.

If it ever bother me enough, I can always create a macro wrapping tracing macros that forces the types to be checked at compile time and makes it impossible for the strings to be mistyped.

tokio-rs / tracing