tokio-rs / tracing

Application level tracing for Rust.
https://tracing.rs
MIT License
5.53k stars 725 forks source link

[opentelemetry] opentelemetry spans opened within tracing spans do not have their parent ID set to the tracing span's ID #1458

Open lily-mara opened 3 years ago

lily-mara commented 3 years ago

Bug Report

Version

[dependencies]
opentelemetry = { version = "0.14", features = ["trace"] }
tracing-opentelemetry = "0.13"
tracing-subscriber = "0.2"
tracing = "0.1"
async-trait = "0.1"
├── opentelemetry v0.14.0
├── tracing v0.1.26
│   ├── tracing-attributes v0.1.15 (proc-macro)
│   └── tracing-core v0.1.18
├── tracing-opentelemetry v0.13.0
│   ├── opentelemetry v0.14.0 (*)
│   ├── tracing v0.1.26 (*)
│   ├── tracing-core v0.1.18 (*)
│   ├── tracing-log v0.1.2
│   │   └── tracing-core v0.1.18 (*)
│   └── tracing-subscriber v0.2.19
│       ├── tracing v0.1.26 (*)
│       ├── tracing-core v0.1.18 (*)
│       ├── tracing-log v0.1.2 (*)
│       └── tracing-serde v0.1.2
│           └── tracing-core v0.1.18 (*)
└── tracing-subscriber v0.2.19 (*)

Platform

Linux DESKTOP-8SQJE0D 5.4.72-microsoft-standard-WSL2 #1 SMP Wed Oct 28 23:40:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Crates

tracing-opentelemetry

Description

I tried this code: ``` use async_trait::async_trait; use opentelemetry::global; use opentelemetry::global::BoxedTracer; use opentelemetry::sdk::export::trace::{ExportResult, SpanData, SpanExporter}; use opentelemetry::trace::{Tracer, TracerProvider}; use tracing_subscriber::prelude::*; fn main() -> std::io::Result<()> { let exporter = Exporter; let provider = opentelemetry::sdk::trace::TracerProvider::builder() .with_simple_exporter(exporter) .build(); let tracer = provider.get_tracer("tracing-opentelemetry", None); let _ = global::set_tracer_provider(provider); let telemetry = tracing_opentelemetry::layer().with_tracer(tracer); let subscriber = tracing_subscriber::Registry::default().with(telemetry); tracing::subscriber::set_global_default(subscriber).unwrap(); let tracer = global::tracer("my_service"); tracer.in_span("my_span", |_cx| { std::thread::sleep(std::time::Duration::from_millis(20)); traced_fn(&tracer); std::thread::sleep(std::time::Duration::from_millis(20)); }); global::shutdown_tracer_provider(); Ok(()) } #[derive(Debug)] pub struct Exporter; #[async_trait] impl SpanExporter for Exporter { /// Export spans to stdout async fn export(&mut self, batch: Vec) -> ExportResult { for span in batch { println!( "name: {}\tid: {}\tparent: {}", span.name, span.span_context.span_id().to_hex(), span.parent_span_id.to_hex(), ); } Ok(()) } } fn traced_fn(tracer: &BoxedTracer) { let _entered = tracing::info_span!("traced_fn").entered(); std::thread::sleep(std::time::Duration::from_millis(20)); otel_fn(tracer); std::thread::sleep(std::time::Duration::from_millis(20)); } fn otel_fn(tracer: &BoxedTracer) { tracer.in_span("otel_fn", |_cx| { std::thread::sleep(std::time::Duration::from_millis(20)); }); } ``` I expected to see this happen: Running the code should show the spans all inherit from each other, something like this: ``` $ cargo run name: otel_fn id: 324a6415799e7557 parent: 381bfb0e5fc73ce0 name: traced_fn id: 381bfb0e5fc73ce0 parent: dbd0b12cbde8bda3 name: my_span id: dbd0b12cbde8bda3 parent: 0000000000000000 ``` The leaf `otel_fn` span should have `traced_fn` span set as its parent. Instead, this happened: The `otel_fn` span has the root `my_span` span set as its parent: ``` $ cargo run name: otel_fn id: 324a6415799e7557 parent: dbd0b12cbde8bda3 name: traced_fn id: 381bfb0e5fc73ce0 parent: dbd0b12cbde8bda3 name: my_span id: dbd0b12cbde8bda3 parent: 0000000000000000 ``` In applications that treat `opentelemetry` as a serialization layer and use `tracing` for all actual instrumentation, this is not a very big deal. It becomes a big deal when you want to use any crates which treat `opentelemetry` as a library. I discovered this bug when I was trialing the [actix-web-opentelemtry](https://docs.rs/actix-web-opentelemetry/0.10.0/actix_web_opentelemetry/) library in combination with tracing.
morigs commented 1 year ago

This is critical imho... I did a shallow research on how this works. It looks like this can be fixed by explicitly setting otel context when starting a span, and then reset context to the previous value on span exit. I've checked and this working, however, there is a problem I wasn't able to overcome.

Code for tracing-opentelemetry::OpenTelemetryLayer:

    fn on_enter(&self, id: &span::Id, ctx: Context<'_, S>) {
         // ...
        let span = ctx.span(id).expect("Span not found, this is a bug");
        let mut extensions = span.extensions_mut();
         // ...

        let ctx_guard = extensions
            .get_mut::<otel::SpanBuilder>()
            .map(|builder| self.tracer.sampled_context(builder).attach());

        extensions.insert(ctx_guard);
    }

    fn on_exit(&self, id: &span::Id, ctx: Context<'_, S>) {
        // ...
        let span = ctx.span(id).expect("Span not found, this is a bug");
        let mut extensions = span.extensions_mut();
        // ...

        extensions.remove::<Option<opentelemetry::ContextGuard>>();
    }

In order to be able to keep context active and reset later, we need to store ContextGuard as long as span active. I tried to use extensions for this. However, extensions methods require Send + Sync and ContextGuard explicitly forbids Send because it relies on thread locals... I can trick compiler by using a wrapped type with unsafe impl, but I don't know why exactly ExtensionsMut requires these, so can be unsafe indeed :)

Perhaps with a help of tracing-opentelemetry contributor we can fix this relatively easily