open-telemetry / opentelemetry-rust

The Rust OpenTelemetry implementation
https://opentelemetry.io
Apache License 2.0
1.8k stars 417 forks source link

[Bug]: OTLP exporter fails to send data to backend requiring TLS #2008

Open alanwest opened 1 month ago

alanwest commented 1 month ago

What happened?

The following code worked with opentelemetry/opentelemetry_sdk 0.23.0 and opentelemetry-otlp 0.16.0

I have the following environment variables set

Cargo.toml

[package]
name = "otlp_test"
version = "0.1.0"
edition = "2021"

[dependencies]
opentelemetry = "0.23.0"
opentelemetry-otlp = { version =  "0.16.0", features = ["tls", "tls-roots"] }
opentelemetry_sdk = { version = "0.23.0", features = ["rt-tokio-current-thread"] }
tokio = { version = "1.39.2", features = ["rt-multi-thread"] }

main.rs

use opentelemetry::{global::{self, ObjectSafeSpan}, trace::{Tracer, TracerProvider}, KeyValue};
use opentelemetry_sdk::{propagation::TraceContextPropagator, runtime, trace, Resource};

#[tokio::main]
async fn main() {
    global::set_text_map_propagator(TraceContextPropagator::new());

    let resource = Resource::new(vec![KeyValue::new(
        "service.name",
        "MyService"
    )]);

    let exporter = opentelemetry_otlp::new_exporter().tonic();

    let _tracer = opentelemetry_otlp::new_pipeline()
        .tracing()
        .with_exporter(exporter)
        .with_trace_config(trace::Config::default().with_resource(resource))
        .install_batch(runtime::TokioCurrentThread)
        .expect("failed to initialize trace pipeline");

    let tracer = global::tracer_provider().tracer("MyTracer");

    let mut span = tracer.start("MySpan");
    span.end();

    global::shutdown_tracer_provider();
}

Then I upgraded to opentelemetry 0.24.0/opentelemetry_sdk 0.24.1/opentelemetry-otlp 0.17.0

Cargo.toml

[package]
name = "otlp_test"
version = "0.1.0"
edition = "2021"

[dependencies]
opentelemetry = "0.24.0"
opentelemetry-otlp = { version =  "0.17.0", features = ["tls", "tls-roots"] }
opentelemetry_sdk = { version = "0.24.1", features = ["rt-tokio-current-thread"] }
tokio = { version = "1.39.2", features = ["rt-multi-thread"] }

main.rs

use opentelemetry::{global::{self, ObjectSafeSpan}, trace::{Tracer, TracerProvider}, KeyValue};
use opentelemetry_sdk::{propagation::TraceContextPropagator, runtime, trace, Resource};

#[tokio::main]
async fn main() {
    global::set_text_map_propagator(TraceContextPropagator::new());

    let resource = Resource::new(vec![KeyValue::new(
        "service.name",
        "MyService"
    )]);

    let exporter = opentelemetry_otlp::new_exporter().tonic();

    let tracer_provider = opentelemetry_otlp::new_pipeline()
        .tracing()
        .with_exporter(exporter)
        .with_trace_config(trace::Config::default().with_resource(resource))
        .install_batch(runtime::TokioCurrentThread)
        .expect("failed to initialize trace pipeline");

    global::set_tracer_provider(tracer_provider);

    let tracer = global::tracer_provider().tracer("MyTracer");

    let mut span = tracer.start("MySpan");
    span.end();

    global::shutdown_tracer_provider();
}

I receive the following error

OpenTelemetry trace error occurred. Exporter otlp encountered the following error(s): the grpc server returns error (The service is currently unavailable): , detailed error message: Connecting to HTTPS without TLS enabled

I also tried

let exporter = opentelemetry_otlp::new_exporter().tonic().with_tls_config(tonic::transport::ClientTlsConfig::new());

But this produces a different error as it does not seem to load the system's root certs

OpenTelemetry trace error occurred. Exporter otlp encountered the following error(s): the grpc server returns error (The service is currently unavailable): , detailed error message: invalid peer certificate: UnknownIssuer

API Version

0.24.0

SDK Version

0.24.1

What Exporter(s) are you seeing the problem on?

OTLP

Relevant log output

OpenTelemetry trace error occurred. Exporter otlp encountered the following error(s): the grpc server returns error (The service is currently unavailable): , detailed error message: Connecting to HTTPS without TLS enabled
lalitb commented 4 weeks ago

@alanwest I don't see any TLS config setting in the code above - are you setting any CA certificate(chain) in your example? Otherwise, the code is pretty similar to the basic-otlp example here - https://github.com/open-telemetry/opentelemetry-rust/tree/main/opentelemetry-otlp/examples/basic-otlp, and should work with new otel v24.

alanwest commented 4 weeks ago

I don't see any TLS config setting in the code above - are you setting any CA certificate(chain) in your example?

@lalitb Should I need to? I did not need to with v23. Is this a new requirement with v24?

My expectation is that I should not need to provide any special TLS config in the default case. That is, the system's certificates should be loaded. I assume this was what was happening prior to v24 and is also the behavior of the other language SDKs.

lalitb commented 4 weeks ago

@alanwest From OpenTelemetry perspective, there is no change in how TLS config is handled. The only relevant change done in OpenTelemetry v0.24 is an upgrade to Tonic v0.12.0. There are a few changes regarding TLS in this release if it helps identify the issue:

https://github.com/hyperium/tonic/releases/tag/v0.12.0

--clip

tls: Add ability to add multiple ca certificates (https://github.com/hyperium/tonic/issues/1724) (3457f92) tls: Use rustls_pki_types::CertificateDer to describe DER encoded certificate (https://github.com/hyperium/tonic/issues/1707) (96a8cbc) tls: Remove tls roots implicit configuration (https://github.com/hyperium/tonic/issues/1731) (de73617)

--

alanwest commented 4 weeks ago

Thanks for the sleuth work!

The issue is specifically related to https://github.com/hyperium/tonic/pull/1731. They acknowledge this as a breaking change in a follow up https://github.com/hyperium/tonic/pull/1781. So in effect upgrading to this version also created a breaking change for opentelemetry-rust.

I was really close to a solution in my original post. This is the tracer provider config that works (I missed calling the new with_native_roots method introduced in tonic 1.12).

let tracer_provider = opentelemetry_otlp::new_pipeline()
        .tracing()
        .with_exporter(opentelemetry_otlp::new_exporter().tonic()
            .with_tls_config(tonic::transport::ClientTlsConfig::new().with_native_roots()))
        .with_trace_config(Config::default().with_resource(resource))
        .install_batch(runtime::Tokio)
        .expect("failed to initialize the trace pipeline");

Personally, I'd prefer the TonicExporterBuilder default to including native roots to achieve the same behavior prior to opentelemetry-rust upgrading to tonic 1.12. I think this would be more inline with what other languages are doing. That is, if https is used as the scheme for the endpoint, then configure a vanilla TLS config and include the native roots.

Though, at a bare minimum, I think documenting this new required usage in the examples would be helpful.