open-telemetry / opentelemetry-dotnet

The OpenTelemetry .NET Client
https://opentelemetry.io
Apache License 2.0
3.22k stars 764 forks source link

OpenTelemetry.Shims.OpenTracing Invalid 'SpanContext' when not sampled #4087

Closed konczykl closed 1 year ago

konczykl commented 1 year ago

Bug Report

List of [all OpenTelemetry NuGet packages]:

<PackageReference Include="OpenTelemetry" Version="1.4.0-rc.1" />
<PackageReference Include="OpenTelemetry.Exporter.Console" Version="1.4.0-rc.1" />
<PackageReference Include="OpenTelemetry.Exporter.Jaeger" Version="1.4.0-rc.1" />
<PackageReference Include="OpenTelemetry.Exporter.Prometheus.AspNetCore" Version="1.4.0-rc.1" />
<PackageReference Include="OpenTelemetry.Extensions.Hosting" Version="1.4.0-rc.1" />
<PackageReference Include="OpenTelemetry.Instrumentation.AspNetCore" Version="1.0.0-rc9.10" />
<PackageReference Include="OpenTelemetry.Instrumentation.Hangfire" Version="1.0.0-beta.4" />
<PackageReference Include="OpenTelemetry.Instrumentation.SqlClient" Version="1.0.0-rc9.10" />
<PackageReference Include="OpenTelemetry.Shims.OpenTracing" Version="1.0.0-rc9.11" />

Runtime version:

Symptom

When using shim package

What is the expected behavior? Return some noop instance.

What is the actual behavior? ArgumentException thrown from SpanShim ctor.

Reproduce

Reproduced error: https://github.com/konczykl/OpenTracingShimError

Additional Context

Problem exists if we are not sampling spans - underlying activity is not created resulting in default SpanContext.

using (var parent = tracer.BuildSpan("parent").StartActive())
using (var child = tracer.BuildSpan("child").StartActive())
{

}

Parent span's activity is set as PropagationData and gets created but child's activity is null.

 public SpanShim(TelemetrySpan span)
{
    Guard.ThrowIfNull(span);

    if (!span.Context.IsValid) // here Context is default -> not valid
    {
        throw new ArgumentException($"Invalid '{nameof(SpanContext)}'", nameof(span.Context));
    }

    this.Span = span;
    this.spanContextShim = new SpanContextShim(this.Span.Context);
}
TimothyMothra commented 1 year ago

I spent some time today digging into this.

First, I put together a smaller repro sample as a unit test for easier debugging:

using System.Diagnostics;
using Microsoft.Extensions.DependencyInjection;
using OpenTelemetry.Context.Propagation;
using OpenTelemetry.Shims.OpenTracing;
using OpenTelemetry.Trace;
using OpenTracing;
using Xunit;

[Fact]
public void Investigate4087_AlwaysOff()
{
    string serviceName = "Investigate4087";

    var exportedItems = new List<Activity>();

    var services = new ServiceCollection();

    services.AddOpenTelemetry()
        .WithTracing(b => b
            .AddSource(serviceName)
            .AddInMemoryExporter(exportedItems)
            .SetSampler(new AlwaysOffSampler())); // this fails
            //.SetSampler(new AlwaysOnSampler())); // this works

    services.AddSingleton<ITracer>(sp =>
    {
        var tracerProvider = sp.GetRequiredService<TracerProvider>();
        var tracer = new TracerShim(tracerProvider.GetTracer(serviceName), Propagators.DefaultTextMapPropagator);
        return tracer;
    });

    IServiceProvider serviceProvider = services.BuildServiceProvider();
    var tracer = serviceProvider.GetRequiredService<ITracer>();

    using (var parent = tracer.BuildSpan("parent").StartActive())
    using (var child = tracer.BuildSpan("child").StartActive())
    {
    }
}        

Part of the cause is in StartSpanHelper(), this method tries to start a new activity and return a TelemetrySpan. This method is failing to create an activity and instead returns the TelemetrySpan.NoopInstance.

https://github.com/open-telemetry/opentelemetry-dotnet/blob/d0829fff2f229b67c884526cb4943b99f6c75f29/src/OpenTelemetry.Api/Trace/Tracer.cs#L176-L180

The TelemetrySpan.NoopInstance uses all default values including:

Here, the Span Context is Invalid because of the default values and fails here:

https://github.com/open-telemetry/opentelemetry-dotnet/blob/d0829fff2f229b67c884526cb4943b99f6c75f29/src/OpenTelemetry.Api/Trace/SpanContext.cs#L104 https://github.com/open-telemetry/opentelemetry-dotnet/blob/d0829fff2f229b67c884526cb4943b99f6c75f29/src/OpenTelemetry.Shims.OpenTracing/SpanShim.cs#L44-L51

TimothyMothra commented 1 year ago

While digging into the relationship between Samplers and StartActivity I found this: https://github.com/open-telemetry/opentelemetry-dotnet/blob/d0829fff2f229b67c884526cb4943b99f6c75f29/src/OpenTelemetry/Trace/TracerProviderSdk.cs#L234-L245

In summary, when using the AlwaysOffSampler, the PropogateOrIgnoreData() is effectively determining to not create a new Activity. This eventually results in the ArgumentException detailed above.

-- This sounds related to #3290. I'll follow up.

pjanotti commented 1 year ago

@TimothyMothra @open-telemetry/dotnet-maintainers While some other aspects are mentioned on this issue it seems that we can close it as fixed via #4668

TimothyMothra commented 1 year ago

@pjanotti Confirmed!

@open-telemetry/dotnet-maintainers I think this Issue could be assigned to the current milestone and closed. :)