open-telemetry / opentelemetry-rust

The Rust OpenTelemetry implementation
https://opentelemetry.io
Apache License 2.0
1.85k stars 431 forks source link

SimpleExporter is not useable with OTLP/gRPC #2188

Open cijothomas opened 1 week ago

cijothomas commented 1 week ago

SimpleExporter is not useable with OTLP/gRPC (Tonic client used for gRPC) Minimal repro:

use tracing::info;
use tracing_subscriber::layer::SubscriberExt;
use tracing_subscriber::Registry;
use tracing_subscriber::util::SubscriberInitExt;
use opentelemetry_appender_tracing::layer::OpenTelemetryTracingBridge;
use opentelemetry_otlp::WithExportConfig;

fn main() {
    let logger_provider =  opentelemetry_otlp::new_pipeline()
    .logging()
    .with_exporter(
        opentelemetry_otlp::new_exporter()
            .tonic()
            .with_endpoint("http://localhost:4317"),
    )
    .install_simple().unwrap();

    // Create a new OpenTelemetryTracingBridge using the above LoggerProvider.
    let layer = OpenTelemetryTracingBridge::new(&logger_provider);
    tracing_subscriber::registry()
        .with(layer)
        .init();

    info!(name: "my-event", target: "my-target", "hello from {}. My price is {}", "apple", 1.99);

    let _ = logger_provider.shutdown();
}

Panics with

there is no reactor running, must be called from the context of a Tokio 1.x runtime

If making the main as tokio::main, then export hangs forever.

Need to modify exporters in such a way that they don't require async runtimes, and can do normal blocking calls (assuming there are APIs to do that).

RichardChukwu commented 4 days ago

is this open to be worked on @cijothomas ?

lalitb commented 4 days ago

@RichardChukwu thanks for volunteering to fix the issue. Do you have any approach in mind - if we can discuss it here? The tonic crate used for gRPC seems to require the runtime in all scenarios, while SimpleExporter is supposed to work with the futures crate.

p.s. - @cijothomas is traveling so there could be a delay in response, assigning it to you for now.

mzabaluev commented 4 days ago

The problem is that libraries in the exporter network stack enter tracing spans of their own (normally on debug or tracing level), these go to the global subscriber, and with the simple exporter this leads to a worker thread calling OpenTelemetry tracer while a lock is held in the exporter by the main thread. Try to prevent the reentrancy in tracing by filtering the OpenTelemetry layer with e.g.

use tracing_subscriber::filter;

let layer = OpenTelemetryTracingBridge::new(&logger_provider)
    .with_filter(
        filter::Targets::new().with_target("my-target", Level::INFO)
    );
mzabaluev commented 4 days ago

Panics with

there is no reactor running, must be called from the context of a Tokio 1.x runtime

Yes, opentelemetry-otlp tonic exporter requires a Tokio runtime context, regardless of the choice of simple vs. batch. And you likely want the batch processor with the networked exporter anyway.

RichardChukwu commented 4 days ago

@mzabaluev would you say the Tokio runtime needs to be initialized properly to handle async tasks then?

mzabaluev commented 4 days ago

would you say the Tokio runtime needs to be initialized properly to handle async tasks then?

Yes, putting the initialization into the tokio runtime context will solve the panic.

lalitb commented 4 days ago

@cijothomas I see you mentioned that exporter hangs with tokio::main. Was it because of the infinite loop scenario similar to #1745 , because otherwise the below code works fine for me:

fn init_logs() -> Result<opentelemetry_sdk::logs::LoggerProvider, LogError> {
    let tonic_exporter = opentelemetry_otlp::new_exporter()
        .tonic()
        .with_endpoint("http://localhost:4317");
    let res = opentelemetry_otlp::new_pipeline()
        .logging()
        .with_resource(RESOURCE.clone())
        .with_exporter(
            tonic_exporter,
        )
        .install_simple();
        //.install_batch(opentelemetry_sdk::runtime::Tokio);

        if let Err(ref err) = res {
            println!("Error initializing logs: {:?}", err);
        }
        res
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error + Send + Sync + 'static>> {
    let logger_provider = init_logs().unwrap();
    // Create a new OpenTelemetryTracingBridge using the above LoggerProvider.
    let layer = OpenTelemetryTracingBridge::new(&logger_provider);
    let filter = EnvFilter::new("info")
        .add_directive("hyper=error".parse().unwrap())
        .add_directive("tonic=error".parse().unwrap())
        .add_directive("reqwest=error".parse().unwrap());

    tracing_subscriber::registry()
        .with(filter)
        .with(layer)
        .init();
    info!(name: "my-event", target: "my-target", "hello from {}. My price is {}", "apple", 1.99);

    logger_provider.shutdown()?;
}
lalitb commented 4 days ago

Need to modify exporters in such a way that they don't require async runtimes, and can do normal blocking calls (assuming there are APIs to do that).

tonic has a hard dependency on the tokio runtime and it does not appear to support blocking gRPC calls. As @TommyCpp mentioned in last community meeting, we previously supported grpc-io, which does not rely on any runtime (it uses futures), but this was removed as part of #1534 due to the library's poor maintenance—the last release was over a year ago.

cijothomas commented 4 days ago

The problem is that libraries in the exporter network stack enter tracing spans of their own (normally on debug or tracing level), these go to the global subscriber, and with the simple exporter this leads to a worker thread calling OpenTelemetry tracer while a lock is held in the exporter by the main thread. Try to prevent the reentrancy in tracing by filtering the OpenTelemetry layer with e.g.

use tracing_subscriber::filter;

let layer = OpenTelemetryTracingBridge::new(&logger_provider)
    .with_filter(
        filter::Targets::new().with_target("my-target")
    );

You are right! Added https://github.com/open-telemetry/opentelemetry-rust/pull/2199 recently to prove this with tests, so the fix can be easily validated. Filtering is a temp/hack solution - this has to be solved using a SupressionFlag stored in Context.

cijothomas commented 4 days ago

@cijothomas I see you mentioned that exporter hangs with tokio::main. Was it because of the infinite loop scenario similar to https://github.com/open-telemetry/opentelemetry-rust/issues/1745

Yes exactly. Its deadlock caused by reentrancy in SimpleLogProcessor.

mzabaluev commented 3 days ago

You are right! Added #2199 recently to prove this with tests, so the fix can be easily validated. Filtering is a temp/hack solution - this has to be solved using a SupressionFlag stored in Context.

I thought about that, but I doubt the tracing spans created in the exporter stack will always get the right Context. The thread where this happens may be a worker thread that doesn't have an inherited Context, and the OpenTelemetry layer does not have visibility on the causal relationships between export and any tracing caused by it.

cijothomas commented 3 days ago

You are right! Added #2199 recently to prove this with tests, so the fix can be easily validated. Filtering is a temp/hack solution - this has to be solved using a SupressionFlag stored in Context.

I thought about that, but I doubt the tracing spans created in the exporter stack will always get the right Context. The OpenTelemetry layer may not have visibility on these due to whatever filtering the application has set up, and the thread where this happens may be a worker thread that doesn't have an inherited Context.

Spot on! That is the reason why solving the "Context" problem is important for OTel! It is non-trivial and @lalitb has done a lot of research into this, and came to same conclusion as you shared!

This is something we'll need to solve, most likely, after initial stable release of logs and metrics.