open-telemetry / opentelemetry-rust

The Rust OpenTelemetry implementation
https://opentelemetry.io
Apache License 2.0
1.71k stars 384 forks source link

[Bug]: TracerProvider can be dropped unknowingly #1625

Open SZenglein opened 3 months ago

SZenglein commented 3 months ago

What happened?

Consider this example code:
(I am using tracing_opentelemetry, but it should be reproducible without it)

let telemetry =  {
    // Create a new OpenTelemetry trace pipeline that prints to stdout
    let provider = TracerProvider::builder()
        .with_simple_exporter(
            opentelemetry_otlp::new_exporter()
                .tonic()
                .with_endpoint("http:localhost:4317")
                .with_timeout(Duration::from_secs(3))
                .build_span_exporter()?,
        )
        .with_config(opentelemetry_sdk::trace::Config::default().with_resource(
            Resource::new(vec![KeyValue::new(
                opentelemetry_semantic_conventions::resource::SERVICE_NAME,
                "service-api",
            )]),
        ))
        .build();

    let tracer = provider.tracer("service-api");

    // Create and return a tracing layer with the configured tracer
    tracing_opentelemetry::layer().with_tracer(tracer)
};

Since the provider is not used afterwards, it is very easy to accidentally drop it when executing inside a function or conditional block.
There is no compiler warning or any documentation note regarding this behaviour or that it must be kept alive manually. This results in very hard to debug problem, because code that compiles fine without a warning can stop working just by moving a value into a scope.

I suspect this is because TracerProvider creates the Tracer by downgrading the Arc to a Weak.

IMHO this should at the very least be documented on the TracerProvider struct, maybe a warning can be added to the Drop Implementation, too.

API Version

0.15

SDK Version

0.22.1

What Exporters are you seeing the problem on?

No response

Relevant log output

No response

TommyCpp commented 3 months ago

Yeah it's a known behavior. Users need to manually hold reference to provider or pass the provider to global tracer provider.

Agree on we should update documentation and possible looking into compiler warnings

lalitb commented 3 months ago

Will it make sense for TraceProvider to have similar behavior as LoggerProvider (#1455) - Let Tracer own TracerProvider (with shared referenece to TracerProviderInner), which would ensure that it won't be dropped till all instances of Tracers are dropped.

TommyCpp commented 3 months ago

Ah you are right. Let me give it a try

TommyCpp commented 3 months ago

Like pointed out in #1455 after I did some POC I realized the drawback of curent implementation is we will no longer shutdown tracer/logger processor when dropping TracerProvider.

Considering the following unit test that validates we will not send out spans after tracer provider shuts down

    #[test]
    fn test_span_exported_data() {
        let provider = crate::trace::TracerProvider::builder()
            .with_simple_exporter(NoopSpanExporter::new())
            .build();
        let tracer = provider.tracer("test");

        let mut span = tracer.start("test_span");
        span.add_event("test_event", vec![]);
        span.set_status(Status::error(""));

        let exported_data = span.exported_data();
        assert!(exported_data.is_some());

        drop(provider);
        let dropped_span = tracer.start("span_with_dropped_provider");
        // return none if the provider has already been dropped
        // but this won't happen if the tracer still holds a reference to tracer inner
        assert!(dropped_span.exported_data().is_none());
    }

I think there are two questions we need to answer:

  1. Do we need to shutdown SpanProcssors when TracerProvider drops?
  2. How do we make sure the span exporting stops once the SpanProcessor shuts down?
    • LoggerProvider has try_shutdown which requries users to manualy track down all cloned LoggerProvider, given every Logger has a reference to LoggerProvider this means users is responsible for clean up Logger before LoggerProvider
    • Is it possible for us to keep a Weak reference to every Tracer created from a givenTracerProviderInner. Upon shutting down, the TracerProviderInner notify all Tracer and ask them to remove their reference to TracerProviderInner.
cijothomas commented 1 month ago

https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry/src/trace/mod.rs#L20-L31 - The recently added doc updates should cover this?