open-telemetry / opentelemetry-dotnet-contrib

This repository contains set of components extending functionality of the OpenTelemetry .NET SDK. Instrumentation libraries, exporters, and other components can find their home here.
https://opentelemetry.io
Apache License 2.0
473 stars 282 forks source link

EF Core and SqlClient instrumentation enabled at the same time produce wrong spans #1764

Open virtes opened 1 year ago

virtes commented 1 year ago

Bug Report

List of all OpenTelemetry NuGet packages and version that you are using (e.g. OpenTelemetry 1.0.2):

Runtime version (e.g. net462, net48, netcoreapp3.1, net6.0 etc. You can find this information from the *.csproj file):

Symptom

Using AddSqlClientInstrumentation and AddEntityFrameworkCoreInstrumentation at the same time leads to miss parent span.

What is the expected behavior?

Exports two spans:

  1. EF Core span
  2. SQL Client span, with EF Core span as parent

What is the actual behavior?

  1. EF Core span is missed in the output
  2. SQL Client span, with EF Core span as parent

Output from console exporter - I have only SQL Client span, which has ParentSpanId of EF Core span.

Activity.TraceId:            916f8741531547a078d9966039e4f905
Activity.SpanId:             96a036d8982f1155
Activity.TraceFlags:         Recorded
Activity.ParentSpanId:       8580d5d7086bf8c8
Activity.ActivitySourceName: OpenTelemetry.Instrumentation.SqlClient
Activity.DisplayName:        lsnext_domain
Activity.Kind:               Client
Activity.StartTime:          2023-10-31T13:33:26.4747510Z
Activity.Duration:           00:00:13.8057920
Activity.Tags:
    db.system: mssql
    db.name: lsnext_domain
    peer.service: srv2.local,1433
    db.statement_type: Text
Resource associated with Activity:
    service.name: some-service-name
    service.instance.id: d597a313-eeed-4e9e-a32c-79bb4a1538e8
    telemetry.sdk.name: opentelemetry
    telemetry.sdk.language: dotnet
    telemetry.sdk.version: 1.7.0-alpha.1

We can see both expected spans using Enrich methods (pay attention to Id and ParentSpanId), so I guess that both spans are exist and looks like the first one just doesnt stop:

image image

Reproduce

using Microsoft.EntityFrameworkCore;
using OpenTelemetry;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;

using var tracerProvider = Sdk.CreateTracerProviderBuilder()
    .ConfigureResource(r => r.AddService("some-service-name"))
    .AddSqlClientInstrumentation()
    .AddEntityFrameworkCoreInstrumentation()
    .AddConsoleExporter()
    .Build();

var dbContextOptions = new DbContextOptionsBuilder<SomeDbContext>()
    .UseSqlServer("localhost").Options;
{
    await using var dbContext = new SomeDbContext(dbContextOptions);
    await dbContext.Database.ExecuteSqlAsync($"select 1");
}

await Task.Delay(2000);
Console.ReadKey();

public class SomeDbContext : DbContext
{
    public SomeDbContext(DbContextOptions<SomeDbContext> options) : base(options)
    {
    }
}
<Project Sdk="Microsoft.NET.Sdk">

    <PropertyGroup>
        <OutputType>Exe</OutputType>
        <TargetFramework>net7.0</TargetFramework>
        <ImplicitUsings>enable</ImplicitUsings>
        <Nullable>disable</Nullable>
    </PropertyGroup>

    <ItemGroup>
      <PackageReference Include="Microsoft.EntityFrameworkCore" Version="7.0.13" />
      <PackageReference Include="Microsoft.EntityFrameworkCore.SqlServer" Version="7.0.13" />
      <PackageReference Include="OpenTelemetry" Version="1.7.0-alpha.1" />
      <PackageReference Include="OpenTelemetry.Exporter.Console" Version="1.7.0-alpha.1" />
      <PackageReference Include="OpenTelemetry.Exporter.OpenTelemetryProtocol" Version="1.7.0-alpha.1" />
      <PackageReference Include="OpenTelemetry.Instrumentation.EntityFrameworkCore" Version="1.0.0-beta.8" />
      <PackageReference Include="OpenTelemetry.Instrumentation.SqlClient" Version="1.6.0-beta.2" />
    </ItemGroup>

</Project>

Additional Context

cijothomas commented 1 year ago

Moving to the contrib repo as EF instrumentation is hosted there?

@virtes Do you have more idea on the underlying issue and possible fixes? There isn't a marked owner for EF instrumentation, so if you/others can help investigate and provide fix, that'd be super helpful!

virtes commented 1 year ago

@cijothomas I'm not sure, that the problem is caused by EF instrumentation.

As I understood, the underlying problem is that EntityFrameworkDiagnosticListener and SqlClientDiagnosticListener are called in parallel so stopping the SqlClient activity doesnt reflect actual current activity in EF listener and this causes the EF instrumentation to use obsolete activity. It turns out that the child SqlClient span is used here, but the EF activity is expected here, so the activity doesnt stop and could not be exported

If I misunderstood anything, please correct me

mbakalov commented 1 year ago

I will take a look at this one! cc @cijothomas

markphillips100 commented 9 months ago

I've just experienced this issue. Any progress been made?

My package versions are:

    <PackageVersion Include="OpenTelemetry.Exporter.Console" Version="[1.7.0, 2.0.0)" />
    <PackageVersion Include="OpenTelemetry.Exporter.OpenTelemetryProtocol" Version="[1.7.0, 2.0.0)" />
    <PackageVersion Include="OpenTelemetry.Extensions.Hosting" Version="[1.7.0, 2.0.0)" />
    <PackageVersion Include="OpenTelemetry.Instrumentation.AspNetCore" Version="[1.7.0, 2.0.0)" />
    <PackageVersion Include="OpenTelemetry.Instrumentation.EntityFrameworkCore" Version="1.0.0-beta.9" />
    <PackageVersion Include="OpenTelemetry.Instrumentation.Http" Version="[1.7.0, 2.0.0)" />
    <PackageVersion Include="OpenTelemetry.Instrumentation.SqlClient" Version="1.6.0-beta.3" />
zivaninstana commented 2 weeks ago

I did a little investigation. Problem is here: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/2ca95ada73d6e3332bd4f48303b087ec438324f9/src/OpenTelemetry.Instrumentation.EntityFrameworkCore/Implementation/EntityFrameworkDiagnosticListener.cs#L269

Better say problem is sequence of events. Microsoft.EntityFrameworkCore.Database.Command.CommandExecuted happens before than Microsoft.Data.SqlClient.WriteCommandAfter Because of that SQLClient span is still active and because of the condition I mentioned above we are skipping to stop EF activity on Microsoft.EntityFrameworkCore.Database.Command.CommandExecuted event and that event of course never repeat again.

How EF span is not closed cannot be reported but yes it is still parent of SQL span.

Do we have any practice how to solve such problems? Problem here can be wrong order of events from sources. We have two different libraries so we cannot say who fire event in good moment. It also can be wrong order of processing on OTEL side, which I think is less likely.