open-telemetry / opentelemetry-cpp

The OpenTelemetry C++ Client
https://opentelemetry.io/
Apache License 2.0
860 stars 410 forks source link

Attempt to create new Tracer changes name and version of existing Tracers #1307

Closed sirzooro closed 2 years ago

sirzooro commented 2 years ago

Describe your environment CentOS 8 gcc 10.3.1 opentelemetry-cpp built from main branch on March 25th, configured using this command: cmake -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DBUILD_SHARED_LIBS=ON -DWITH_OTLP=ON -DWITH_OTLP_GRPC=OFF -DWITH_ZPAGES=ON -DCMAKE_INSTALL_PREFIX=/usr .. App code is compiled using C++11.

Steps to reproduce Code creates TracerProvider using code from samples, creates two different Tracers and uses them to create some spans:

auto tracer_provider = opentelemetry::nostd::shared_ptr<opentelemetry::trace::TracerProvider>
    (new opentelemetry::sdk::trace::TracerProvider(std::move(batch_processor), resource, std::move(always_on_sampler)));
opentelemetry::trace::Provider::SetTracerProvider(tracer_provider);

auto tracer1 = opentelemetry::trace::Provider::GetTracerProvider()->GetTracer("Lib1", "1.0");
auto tracer2 = opentelemetry::trace::Provider::GetTracerProvider()->GetTracer("Lib2", "1.1");

auto span1 = tracer1->StartSpan("Span1");
auto span2 = tracer2->StartSpan("Span2");
span1->End();
span2->End();

What is the expected behavior? Both spans have otel.library.name and otel.library.version set to values provided when parent Tracer is created.

What is the actual behavior? All spans have otel.library.name and otel.library.version set to the same values. It seems that pair with lexicographically smallest name wins - first I created 2 Tracers with names "Ne..." and "NS..." and "Ne..." one won. Then I added 3rd "C..." and this one was selected.

sirzooro commented 2 years ago

I did few more tests and found that this happens when OtlpHttpExporter and BatchSpanProcessor are used together. When I tried to use OStreamSpanExporter or SimpleSpanProcessor instead of one of them, problem disappeared. Code below reproduces this bug. I compiled it using following command: g++ -o test test.cc -O3 -Wall -lopentelemetry_resources -lopentelemetry_trace -lopentelemetry_exporter_otlp_http

#include <opentelemetry/sdk/trace/simple_processor.h>
#include <opentelemetry/sdk/trace/tracer_provider.h>
#include <opentelemetry/trace/provider.h>
#include <opentelemetry/sdk/trace/batch_span_processor.h>
#include <opentelemetry/exporters/otlp/otlp_http_exporter.h>

int main()
{
    auto exporter = std::unique_ptr<opentelemetry::sdk::trace::SpanExporter>(
        new opentelemetry::exporter::otlp::OtlpHttpExporter(opentelemetry::exporter::otlp::OtlpHttpExporterOptions{}));
    auto processor = std::unique_ptr<opentelemetry::sdk::trace::SpanProcessor>(
        new opentelemetry::sdk::trace::BatchSpanProcessor(std::move(exporter), opentelemetry::sdk::trace::BatchSpanProcessorOptions{}));
    auto provider = opentelemetry::nostd::shared_ptr<opentelemetry::trace::TracerProvider>(new opentelemetry::sdk::trace::TracerProvider(
        std::move(processor), opentelemetry::sdk::resource::Resource::Create({})));
    opentelemetry::trace::Provider::SetTracerProvider(provider);

    auto tracer1 = opentelemetry::trace::Provider::GetTracerProvider()->GetTracer("Lib1", "1.0");
    auto tracer2 = opentelemetry::trace::Provider::GetTracerProvider()->GetTracer("Lib2", "2.0");

    auto span1 = tracer1->StartSpan("Span1");
    auto scope1 = opentelemetry::trace::Tracer::WithActiveSpan(span1);
    {
        auto span2 = tracer2->StartSpan("Span2");
        span2->End();
    }
    span1->End();

    return 0;
}
lalitb commented 2 years ago

@sirzooro Thanks for your analysis. The investigation becomes relatively easy with such details in the issue :)

There seems to be a problem in the way tracer information (name and version) is populated in the grpc exporter. The grpc exporter assumes all the spans in a batch (coming from BatchSpanProcessor) belonging to the same tracer, and so groups all of them under the first tracer in the list. This is a wrong assumption, and the grouping needs to be fixed.

https://github.com/open-telemetry/opentelemetry-cpp/blob/2034c9bcfa9cf6a995891efed36264088a333d37/exporters/otlp/src/otlp_recordable_utils.cc#L264-L281

Simple processor sends one span at a time to grpc exporter, so there is no grouping of span, and the problem doesn't bubble out there.