tokio-rs / tracing

Application level tracing for Rust.
https://tracing.rs
MIT License
5.32k stars 696 forks source link

Distributed tracing #89

Closed kazimuth closed 2 years ago

kazimuth commented 5 years ago

I saw a little bit of discussion of this in tokio-rs/tokio#561, but there aren't currently any open issues for it, so I figured I'd start one.

Distributed tracing is like tokio-trace but extended for distributed systems; instead of tracing only within a single process, you can trace code execution of code across process boundaries, machines, data centers, continents... There are a number of systems for this, e.g. Jaeger, zipkin, a bunch of others. Tokio-trace is perfectly set up to support distributed tracing, only the actual glue code needs to be written.

  1. Trace propagation -- in a distributed tracing system, incoming and outgoing requests to a process are annotated with some form of ID, so that they can be collated later. In the past, tracing systems defined their own bespoke propagation formats, but going forward it looks like people are standardizing on the W3C Trace Context Recommendation.

  2. Trace export -- after traces are recorded, they need to be sent to a central location for viewing. This can be done push-style (having your application connect somewhere and send data) or pull-style (exposing data on some port that can later be scraped). There are open APIs for doing this -- OpenTracing and OpenCensus, which are currently merging into OpenTelemetry.

I think the simplest path forward on this is to build:

Ideally end users should be able to do:

fn main() {
    opencensus::export();
    actix_web::App::new()
       .middleware(TraceContextMiddleware::new())
       /* ... */
       .finish();
}

and have things just work ✨

hawkw commented 5 years ago

Thanks for opening this issue! Supporting integration with distributed tracing systems is definitely a goal for tokio-trace, and we've tried to design the core primitives to make integrating with distributed tracing easy. However, nobody has actually written any such integrations yet.

I've done some thinking about how one might want to go about writing a distributed tracing integration for tokio-trace in the past. I think we would start by writing a subscriber that consumes tokio-trace spans and events, and translates them into a format suitable for exporting to the distributed tracing system. Then, we would write middleware/glue for various web frameworks and libraries, as you suggested, that would parse incoming trace contexts and associate them with tokio-trace spans.

A potential way to associate the trace contexts with spans is to use the Subscriber downcasting API. I discussed how this could be done in the PR that added support for downcasting subscribers: https://github.com/tokio-rs/tokio/pull/974.

Alternatively, constructing the subscriber could return a subscriber and a handle type that allows sending trace context IDs to the subscriber. The trace context middlewares could then be constructed using that handle. This might be more efficient than getting the current subscriber, but it would require users to thread through that handle from where the subscriber is created to wherever the middlewares are constructed.

@kazimuth are you interested in writing a tokio-trace/OpenTelemetry integration? If so, I'd be happy to provide guidance. Regardless, thanks for opening this issue, as it'll provide a place for folks interested in seeing this to discuss how it ought to be implemented.

kazimuth commented 5 years ago

I'm about to start grad school so I don't want to commit to any intense projects right now. Thanks for the offer of help, though :) i may poke around on this at some point.

I think ideally the trace context ID should be ambiently available, rather than having to be explicitly threaded through control flow. Explicitly threading anything is a pain, lol -- the system should involve as little as much as possible if we want people to try out the system. Downcasting subscribers would definitely work for an initial implementation.

Longer-term, it might make sense to add something like "baggage" to the tokio-trace API - values attached to a span tree that the subscriber is required to keep around. That complicates subscriber impls, but it makes it super easy to propagate metadata without having to worry about the underlying subscriber implementation.

hawkw commented 5 years ago

I think ideally the trace context ID should be ambiently available, rather than having to be explicitly threaded through control flow.

I totally agree. My expectation is that when a request with a trace context ID is received, the middleware informs the subscriber which stores the context associated with the current span. We could then have a free function to get it by downcasting the subscriber to the expected type and asking it for the trace ID.

I'm not opposed to the idea you brought up around arbitrary metadata, though it seems like a lot of complexity in a system where we already have a notion of "fields". Seems worthy of more thought.

kazimuth commented 5 years ago

Yeah it's sorta a pain. Sometimes I wish rust had something like go's ctx.Ctx for misc stuff like this.

Could also do something like making the distributed subscriber wrap another subscriber and just handle the distributed stuff... But then you wouldn't be able to downcast it. Unless you stored the other subscriber as a dyn Subscriber, I guess.

anton-dutov commented 4 years ago

Have you looked at the rustracing / rustracing_jaeger crates for impl ideas?

inanna-malick commented 4 years ago

I've been playing around with a tracing honeycomb subscriber and one of the things I've had to spend some time thinking about is how trace ids are generated/discovered.

My initial impl just generates a trace id for all top-level spans, but it might make more sense to make this explicit instead of implicit. For example, in a GRPC service I'm building, I think I want to either generate per-request tracing ids or pick up external tracing ids at the request handler level.

thedodd commented 4 years ago

OT/Jaeger support would definitely be a pretty big win!

bbigras commented 4 years ago

Any progress on this?

hawkw commented 4 years ago

@bbigras depending on what distributed tracing system you're using, there are some (work in progress) implementations available: tracing-opentelemetry for OpenTelemetry, and the honeycomb-tracing crate @pkinsky mentions in https://github.com/tokio-rs/tracing/issues/89#issuecomment-544639597 for Honeycomb users.

inanna-malick commented 4 years ago

@bbigras I've updated the honeycomb-tracing crate to support arbitrary backends (not just honeycomb) in a branch. Currently all tests are passing, I'll be publishing this new version sometime in the next week.

thedodd commented 4 years ago

@pkinsky nice! Does it support OpenTelemetry / Jaeger and such? I'm not a big fan of all the ceremony required in getting the tracing-opentelemetry crate setup, especially because it is using the rustracing_jaeger crate under the hood, and setting that up is even more painful. The setup for you honeycomb crate looks pretty terse. I like.

inanna-malick commented 4 years ago

@thedodd

In theory it should support pretty much any tracing backend, all you need to do is implement this trait with whatever backing logic you prefer:

pub trait Telemetry {
    type Visitor: Default + tracing::field::Visit;

    fn report_span<'a>(&self, span: Span<'a, Self::Visitor>);
    fn report_event<'a>(&self, event: Event<'a, Self::Visitor>);
}

(also, tldr, I'm trans, I'm using this library release as an opportunity to do some much-needed identity refactoring)

velvia commented 2 years ago

Hi folks, I found this old issue and I'm trying to get trucing-opentelemetry and Jaeger to work together. I am able to get spans into Jaeger, however I cannot find a way to tie the spans together with trace propagation. I'm trying to use code like the following:

     let context = global::get_text_map_propagator(|propagator| propagator.extract(&carrier))
     span.set_parent(context);

Does it matter what is in the hashmap in carrier?

Thanks.

davidbarsky commented 2 years ago

@velvia: I've opened a discussion https://github.com/tokio-rs/tracing/discussions/1991 to address your question.


Since this functionality already exists through libraries such as tracing-opentelemetry and tracing-honeycomb, I'll close this issue.