Closed konczykl closed 1 year ago
I spent some time today digging into this.
First, I put together a smaller repro sample as a unit test for easier debugging:
using System.Diagnostics;
using Microsoft.Extensions.DependencyInjection;
using OpenTelemetry.Context.Propagation;
using OpenTelemetry.Shims.OpenTracing;
using OpenTelemetry.Trace;
using OpenTracing;
using Xunit;
[Fact]
public void Investigate4087_AlwaysOff()
{
string serviceName = "Investigate4087";
var exportedItems = new List<Activity>();
var services = new ServiceCollection();
services.AddOpenTelemetry()
.WithTracing(b => b
.AddSource(serviceName)
.AddInMemoryExporter(exportedItems)
.SetSampler(new AlwaysOffSampler())); // this fails
//.SetSampler(new AlwaysOnSampler())); // this works
services.AddSingleton<ITracer>(sp =>
{
var tracerProvider = sp.GetRequiredService<TracerProvider>();
var tracer = new TracerShim(tracerProvider.GetTracer(serviceName), Propagators.DefaultTextMapPropagator);
return tracer;
});
IServiceProvider serviceProvider = services.BuildServiceProvider();
var tracer = serviceProvider.GetRequiredService<ITracer>();
using (var parent = tracer.BuildSpan("parent").StartActive())
using (var child = tracer.BuildSpan("child").StartActive())
{
}
}
Part of the cause is in StartSpanHelper()
, this method tries to start a new activity and return a TelemetrySpan.
This method is failing to create an activity and instead returns the TelemetrySpan.NoopInstance
.
The TelemetrySpan.NoopInstance
uses all default values including:
Here, the Span Context is Invalid because of the default values and fails here:
https://github.com/open-telemetry/opentelemetry-dotnet/blob/d0829fff2f229b67c884526cb4943b99f6c75f29/src/OpenTelemetry.Api/Trace/SpanContext.cs#L104 https://github.com/open-telemetry/opentelemetry-dotnet/blob/d0829fff2f229b67c884526cb4943b99f6c75f29/src/OpenTelemetry.Shims.OpenTracing/SpanShim.cs#L44-L51
While digging into the relationship between Samplers and StartActivity I found this: https://github.com/open-telemetry/opentelemetry-dotnet/blob/d0829fff2f229b67c884526cb4943b99f6c75f29/src/OpenTelemetry/Trace/TracerProviderSdk.cs#L234-L245
In summary, when using the AlwaysOffSampler, the PropogateOrIgnoreData() is effectively determining to not create a new Activity. This eventually results in the ArgumentException detailed above.
-- This sounds related to #3290. I'll follow up.
@TimothyMothra @open-telemetry/dotnet-maintainers While some other aspects are mentioned on this issue it seems that we can close it as fixed via #4668
@pjanotti Confirmed!
@open-telemetry/dotnet-maintainers I think this Issue could be assigned to the current milestone and closed. :)
Bug Report
List of [all OpenTelemetry NuGet packages]:
Runtime version:
Symptom
When using shim package
What is the expected behavior? Return some noop instance.
What is the actual behavior? ArgumentException thrown from SpanShim ctor.
Reproduce
Reproduced error: https://github.com/konczykl/OpenTracingShimError
Additional Context
Problem exists if we are not sampling spans - underlying activity is not created resulting in default SpanContext.
Parent span's activity is set as PropagationData and gets created but child's activity is null.