open-telemetry / opentelemetry-dotnet

The OpenTelemetry .NET Client
https://opentelemetry.io
Apache License 2.0
3.27k stars 769 forks source link

OpenTracingShim generate span with invalid spanContext for custom instrumentaion #2787

Closed yehaotian closed 1 year ago

yehaotian commented 2 years ago

Bug Report

List of NuGet packages and version that you are using:

Runtime version: netcoreapp3.1

Symptom

We are using our own instrumentations/diagnostics to build spans for specific activities. Static TracerShim:

var openTelemetry = Sdk.CreateTracerProviderBuilder()
                .AddSource("Test.Tracing")
                .SetResourceBuilder(ResourceBuilder.CreateDefault()
                    .AddService(serviceName: "Test_ServiceName", serviceVersion: "Test_Version"))
                .AddOtlpExporter(opt =>
                {
                    opt.Endpoint = new Uri(INTERNAL_URI);
                    opt.Headers = "TOKEN=INTERNAL_TOKEN";
                })
                .Build();

            return new TracerShim(openTelemetry.GetTracer("Test.Tracing"), Propagators.DefaultTextMapPropagator);
        }

Custom diagnostics:

case "Microsoft.AspNetCore.Mvc.BeforeAction":
                    {
                        var httpContext = (HttpContext)_BeforeActionHttpContextFetcher.Fetch(untypedArg);
                        var request = httpContext.Request;
                        ISpanContext extractedSpanContext =  Tracer.Extract(BuiltinFormats.HttpHeaders,
                                new RequestHeadersExtractAdapter(request.Headers));

                        var operationName = _Options.Hosting.HttpOperationNameResolver(httpContext);
                        var actionScope = Tracer.BuildSpan(operationName)
                            .AsChildOf(extractedSpanContext)
                            .WithTag(TracingTags.HttpMethod, request.Method)
                            .WithTag(TracingTags.SpanKind, TracingTags.SpanKindServer)
                            .WithTag(TracingTags.Scheme, request.Scheme)
                            .WithTag(TracingTags.Path, request.Path.Value)
                            .WithTag(TracingTags.Host, request.Host.Value) 
                            .StartActive();

                        httpContext.Items[ActionScopeItemsKey] = actionScope;
                    }
                    break;

What is the expected behavior? The generated span will have valid spanContext in SpanContextShim. And related spans can be created successfully.

What is the actual behavior? The span contains empty fields for context which will cause Passed span's context is not valid ArgumentException when build other child spans for other activities. And no span get created successfully.

Printed span info:

"rootSpan": {
  "context": {
   "spanContext": {
    "traceId": {},
    "spanId": {},
    "traceFlags": 0,
    "isRemote": false,
    "isValid": true,
    "traceState": []
   },
   "traceId": "0828ca1eb896547a61504c4ffcfbd397",
   "spanId": "84647169dea59879"
  }
...
 }

Reproduce

Since we are leveraging same design as OpenTracing.Contrib.NetCore, we can use the project directly with opentelemetry-dotnet/examples/AspNetCore for reproducing the issue. <PackageReference Include="OpenTracing.Contrib.NetCore" Version="0.8.0" />

In Startup.cs:

services.AddOpenTelemetryTracing((builder) => builder
                    .AddSource("Test.Tracing")
                    .SetResourceBuilder(ResourceBuilder.CreateDefault()
                        .AddService(serviceName: "test-name", serviceVersion: "test-version"))
                    .AddOtlpExporter(opt =>
                    {
                        opt.Endpoint = new Uri(INTERNAL_URI);
                        opt.Headers = "TOKEN=INTERNAL_TOKEN";
                    })
            );
            services.AddSingleton<ITracer>(provider =>
            {
                var traceProvider = provider.GetRequiredService<TracerProvider>();
                var tracer = traceProvider.GetTracer("Test.Tracing");
                var tracerShim = new TracerShim(tracer, Propagators.DefaultTextMapPropagator);
                GlobalTracer.Register(tracerShim);
                return tracerShim;
            });
            services.AddOpenTracing();

Warning when running the program:

warn: OpenTracing.Contrib.NetCore.AspNetCore.AspNetCoreDiagnostics[0]
      Event-Exception: Microsoft.AspNetCore.Mvc.BeforeActionResult
System.ArgumentException: Passed span's context is not valid (Parameter 'Context')
   at OpenTelemetry.Shims.OpenTracing.SpanShim..ctor(TelemetrySpan span)
   at OpenTelemetry.Shims.OpenTracing.SpanBuilderShim.Start()
   at OpenTelemetry.Shims.OpenTracing.SpanBuilderShim.StartActive(Boolean finishSpanOnDispose)
   at OpenTelemetry.Shims.OpenTracing.SpanBuilderShim.StartActive()
   at OpenTracing.Contrib.NetCore.AspNetCore.AspNetCoreDiagnostics.HandleEvent(String eventName, Object untypedArg)
   at OpenTracing.Contrib.NetCore.Internal.DiagnosticEventObserver.System.IObserver<System.Collections.Generic.KeyValuePair<System.String,System.Object>>.OnNext(KeyValuePair`2 value)

Additional Context

MMartyn commented 1 year ago

It isn't clear to me, is this issue still an open issue or was it resolved with this

pjanotti commented 1 year ago

Hi @MMartyn - when I submitted the PR that you linked I didn't have a chance to directly verify if this specific issue was resolved by it. So I was hesitant to close the issue. My understanding at the time was that this one should have been fixed by it.

@yehaotian could you let's know if the issue is resolved for you?

yehaotian commented 1 year ago

Thanks for the response, will try find some time to update the environment and do the test

yehaotian commented 1 year ago

Hi @pjanotti and @MMartyn, I updated all OpenTelemetry.* versions and still get the error. Steps:

  1. Follow https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/examples/Console/TestOpenTracingShim.cs to initialize shim tracer
  2. Apply the changes and use shim tracer to generate spans in https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/examples/AspNetCore/Controllers/WeatherForecastController.cs

Getting error:

fail: Microsoft.AspNetCore.Server.Kestrel[13]
      Connection id "0HMRNCH75LCI6", Request id "0HMRNCH75LCI6:00000002": An unhandled exception was thrown by the application.
      System.ArgumentException: Invalid 'SpanContext' (Parameter 'Context')
         at OpenTelemetry.Shims.OpenTracing.SpanShim..ctor(TelemetrySpan span)
         at OpenTelemetry.Shims.OpenTracing.SpanBuilderShim.Start()
         at OpenTelemetry.Shims.OpenTracing.SpanBuilderShim.StartActive(Boolean finishSpanOnDispose)
         at getting_started.WeatherForecastController.Get() in /Users/ytian/RiderProjects/openTelemetry-sandbox/OpenTelemetrySandbox/OtelTracingSandbox/WeatherForecastController.cs:line 73
         at lambda_method2(Closure , Object , Object[] )
         at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.SyncObjectResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
         at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.InvokeActionMethodAsync()
         at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
         at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.InvokeNextActionFilterAsync()
      --- End of stack trace from previous location ---
         at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)
         at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
         at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.InvokeInnerFilterAsync()
      --- End of stack trace from previous location ---
         at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeFilterPipelineAsync>g__Awaited|20_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
         at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeAsync>g__Logged|17_1(ResourceInvoker invoker)
         at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeAsync>g__Logged|17_1(ResourceInvoker invoker)
         at Microsoft.AspNetCore.Routing.EndpointMiddleware.<Invoke>g__AwaitRequestTask|6_0(Endpoint endpoint, Task requestTask, ILogger logger)
         at Swashbuckle.AspNetCore.SwaggerUI.SwaggerUIMiddleware.Invoke(HttpContext httpContext)
         at Swashbuckle.AspNetCore.Swagger.SwaggerMiddleware.Invoke(HttpContext httpContext, ISwaggerProvider swaggerProvider)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication`1 application)
LogRecord.Timestamp:               2023-06-27T23:34:09.4026690Z
LogRecord.TraceId:                 1a1eb2abeb10dacabf485d0eba72c236
LogRecord.SpanId:                  47d1faa5a84392d4
LogRecord.TraceFlags:              None
LogRecord.CategoryName:            Microsoft.AspNetCore.Server.Kestrel
LogRecord.LogLevel:                Error
LogRecord.FormattedMessage:        Connection id "0HMRNCH75LCI6", Request id "0HMRNCH75LCI6:00000002": An unhandled exception was thrown by the application.
LogRecord.Body:                    Connection id "{ConnectionId}", Request id "{TraceIdentifier}": An unhandled exception was thrown by the application.
LogRecord.Attributes (Key:Value):
    ConnectionId: 0HMRNCH75LCI6
    TraceIdentifier: 0HMRNCH75LCI6:00000002
    OriginalFormat (a.k.a Body): Connection id "{ConnectionId}", Request id "{TraceIdentifier}": An unhandled exception was thrown by the application.
LogRecord.EventId:                 13
LogRecord.EventName:               ApplicationError
LogRecord.Exception:               System.ArgumentException: Invalid 'SpanContext' (Parameter 'Context')
   at OpenTelemetry.Shims.OpenTracing.SpanShim..ctor(TelemetrySpan span)
   at OpenTelemetry.Shims.OpenTracing.SpanBuilderShim.Start()
   at OpenTelemetry.Shims.OpenTracing.SpanBuilderShim.StartActive(Boolean finishSpanOnDispose)
   at getting_started.WeatherForecastController.Get() in /Users/ytian/RiderProjects/openTelemetry-sandbox/OpenTelemetrySandbox/OtelTracingSandbox/WeatherForecastController.cs:line 73
   at lambda_method2(Closure , Object , Object[] )
   at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.SyncObjectResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.InvokeActionMethodAsync()
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.InvokeNextActionFilterAsync()
--- End of stack trace from previous location ---
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.InvokeInnerFilterAsync()
--- End of stack trace from previous location ---
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeFilterPipelineAsync>g__Awaited|20_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeAsync>g__Logged|17_1(ResourceInvoker invoker)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeAsync>g__Logged|17_1(ResourceInvoker invoker)
   at Microsoft.AspNetCore.Routing.EndpointMiddleware.<Invoke>g__AwaitRequestTask|6_0(Endpoint endpoint, Task requestTask, ILogger logger)
   at Swashbuckle.AspNetCore.SwaggerUI.SwaggerUIMiddleware.Invoke(HttpContext httpContext)
   at Swashbuckle.AspNetCore.Swagger.SwaggerMiddleware.Invoke(HttpContext httpContext, ISwaggerProvider swaggerProvider)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication`1 application)
MMartyn commented 1 year ago

I also get the same error and was just trying to confirm if it was just me.

pjanotti commented 1 year ago

Thanks @MMartyn and @yehaotian for confirming. The behavior is triggered if there are no listeners for the activity source associated to the registered OpenTracing tracer. Here is an example of the workaround:

// Add OTel
builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddAspNetCoreInstrumentation()
        .AddConsoleExporter()
        .AddSource("OpenTracing")); // <- Adding the source of the OpenTracing shim

// Add OpenTracing
ITracer otTracer = new TracerShim(
    TracerProvider.Default.GetTracer("OpenTracing"),
    Propagators.DefaultTextMapPropagator);
OpenTracing.Util.GlobalTracer.Register(otTracer);

Anyway, it shouldn't hit an exception if there are no listeners for the OT shim. So this is still a bug.

yehaotian commented 1 year ago

I still get error when DO NOT have .AddAspNetCoreInstrumentation(), what could be the root cause? Also does "OpenTracing" have to be the source name? Thought this could be customized

pjanotti commented 1 year ago

Also does "OpenTracing" have to be the source name?

No, you should be able to customize it.

pjanotti commented 1 year ago

I still get error when DO NOT have .AddAspNetCoreInstrumentation(), what could be the root cause?

The problem is that OpenTracing shim is broken if there is no parent activity. It is related to other problem in the sense that they hit the same exception, but, likely need two different fixes: one for "no listeners" and another to correctly handle root span.

I will take a look at the code to see if I can quickly fix both issues.

yehaotian commented 1 year ago

That makes sense! Thank you for working on this!

MMartyn commented 1 year ago

Just want to report back that the workaround to add the source worked for me. Thanks.

yehaotian commented 1 year ago

Hi @pjanotti , just want to double check if there is ETA/timeline on the issue fix, we would like to migrate from OpenTracing to OpenTelemetry in this Q and this is a blocker. Thanks!

pjanotti commented 1 year ago

Hi @yehaotian, I was on a short vacation. I will debug this issue this week or the next.

pjanotti commented 1 year ago

In the comments above there were two paths in which the issue was hit:

  1. Missing the corresponding AddSource for the tracer used by the shim. In this case, the TelemetrySpan.NoopInstance is used when trying to create the OTel span from an OTel tracer that has no listeners.
  2. When the AddSource is present, but, AddAspNetCoreInstrumentation was omitted we are getting a root span from AspNetCore that is not recorded, this causes the TelemetrySpan.NoopInstance to be used and once again hit the exception.

It seems that the shim should not consider an invalid span context an error condition.

A related issue is that ScopeManager.Activate also has trouble with the TelemetrySpan.NoopInstance since the no-op is a singleton and internally the shim uses a ConditionalWeakTable.

I started coding some integration tests for the shim, the proposed fix itself is trivial.

sasha-khadasevich commented 1 year ago

Hi When it will be included to release?

Kielek commented 1 year ago

@sasha-khadasevich, fix from #4668 should be included in next 1.6.0-something release.

Other parts related to

are not fixed yet there is no ETA.

yehaotian commented 1 year ago

@Kielek @pjanotti Thanks for the efforts, let us know when 1.6.0-* get released