open-telemetry / opentelemetry-dotnet-contrib

This repository contains set of components extending functionality of the OpenTelemetry .NET SDK. Instrumentation libraries, exporters, and other components can find their home here.
https://opentelemetry.io
Apache License 2.0
445 stars 271 forks source link

SQL client instrumentation does not deliver any traces - how to debug? #1797

Closed Sam13 closed 3 months ago

Sam13 commented 2 years ago

Question

I have instrumented an ASP.NET core microservice with OpenTelemetry (ASP.NET core, HTTP client, SQL client instrumentations). Everything works well except that SQL client instrumentation does not deliver any tracing data (other instrumentations do). This only happens if I deploy the service into a K8s cluster, I cannot reproduce it locally with the same Docker image.

For basic troubleshooting I instrumented my microservice as described here: https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/src/OpenTelemetry#troubleshooting - log level: LogAlways

The log I get is the following:

2022-02-23T10:12:27.3353767Z:Activity started. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-ec584e9e6f41730b9452d364b907606b-e2bd88d346cf95d2-00}
2022-02-23T10:12:27.3377486Z:Activity stopped. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-ec584e9e6f41730b9452d364b907606b-e2bd88d346cf95d2-00}
2022-02-23T10:12:29.3529690Z:Activity started. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-21f6f0a185f8c44b0a2301c7e344df6f-5168202cad3e9281-01}
2022-02-23T10:12:29.3535602Z:Activity stopped. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-21f6f0a185f8c44b0a2301c7e344df6f-5168202cad3e9281-01}
2022-02-23T10:12:30.0946298Z:Activity started. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-e411659c77ed556ba3480c6979b54421-646d410e965fba25-01}
2022-02-23T10:12:30.0998655Z:Activity stopped. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-e411659c77ed556ba3480c6979b54421-646d410e965fba25-01}
2022-02-23T10:12:31.9948133Z:Activity started. OperationName = '{0}', Id = '{1}'.{System.Net.Http.HttpRequestOut}{00-f268b69b425e25878ab98e7d6073533d-94d53018ba882ee3-00}
2022-02-23T10:12:31.9966433Z:Activity stopped. OperationName = '{0}', Id = '{1}'.{System.Net.Http.HttpRequestOut}{00-f268b69b425e25878ab98e7d6073533d-94d53018ba882ee3-00}
2022-02-23T10:12:34.5267498Z:Activity started. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-2987e14086678c4333c3f82c36f84dfc-58447bde008e70a3-00}
2022-02-23T10:12:34.5291432Z:Activity stopped. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-2987e14086678c4333c3f82c36f84dfc-58447bde008e70a3-00}
2022-02-23T10:12:41.8389466Z:Activity started. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-3604e47143729a3df554c808b966b701-534e694557f1ab70-00}
2022-02-23T10:12:41.8408480Z:Activity stopped. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-3604e47143729a3df554c808b966b701-534e694557f1ab70-00}
2022-02-23T10:12:47.3283338Z:Activity started. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-562095ace72700de0ee40c2da24f1a47-c99145569366be94-00}
2022-02-23T10:12:47.3286127Z:Activity stopped. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-562095ace72700de0ee40c2da24f1a47-c99145569366be94-00}
2022-02-23T10:12:49.3543815Z:Activity started. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-90b07eed582fa4e92a6da2149a239024-5146955aa7f43468-01}
2022-02-23T10:12:49.3547137Z:Activity stopped. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-90b07eed582fa4e92a6da2149a239024-5146955aa7f43468-01}
2022-02-23T10:12:51.9976091Z:Activity started. OperationName = '{0}', Id = '{1}'.{System.Net.Http.HttpRequestOut}{00-f06007baf72bef1fa5ea25679373f83a-2324ae4b1da7a6d5-00}
2022-02-23T10:12:52.0005320Z:Activity stopped. OperationName = '{0}', Id = '{1}'.{System.Net.Http.HttpRequestOut}{00-f06007baf72bef1fa5ea25679373f83a-2324ae4b1da7a6d5-00}
2022-02-23T10:12:54.5289178Z:Activity started. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-e3173fc53707f059f0df7efcd541142b-7b9ae735f9cd2ac6-00}
2022-02-23T10:12:54.5311142Z:Activity stopped. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-e3173fc53707f059f0df7efcd541142b-7b9ae735f9cd2ac6-00}

When I do the same in a local environment I get the following:

2022-02-23T12:05:39.3576381Z:Activity stopped. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-93b4ab3b63b834b219879e754771e83a-4c531c10f92fb08e-01}
2022-02-23T12:05:39.5055319Z:Activity started. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-0cb5ad2ffb0c7176e9b127c1edbf14a1-35187061c2b63237-00}
2022-02-23T12:05:39.6557002Z:Activity stopped. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-0cb5ad2ffb0c7176e9b127c1edbf14a1-35187061c2b63237-01}
2022-02-23T12:05:43.8117782Z:Activity started. OperationName = '{0}', Id = '{1}'.{System.Net.Http.HttpRequestOut}{00-3000aa160422587a25da8e4ee532b298-f05e6beb06d1a4e7-00}
2022-02-23T12:05:43.8929861Z:Activity stopped. OperationName = '{0}', Id = '{1}'.{System.Net.Http.HttpRequestOut}{00-3000aa160422587a25da8e4ee532b298-f05e6beb06d1a4e7-00}
2022-02-23T12:05:50.1431527Z:Activity started. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-520b840e8a29c2bbfeb2d7aba078a8f8-02e485ab652cd3b9-00}
2022-02-23T12:05:50.4294962Z:Activity started. OperationName = '{0}', Id = '{1}'.{OpenTelemetry.SqlClient.Execute}{00-520b840e8a29c2bbfeb2d7aba078a8f8-52a448969b8b6241-01}
2022-02-23T12:05:50.4704577Z:Activity stopped. OperationName = '{0}', Id = '{1}'.{OpenTelemetry.SqlClient.Execute}{00-520b840e8a29c2bbfeb2d7aba078a8f8-52a448969b8b6241-01}
2022-02-23T12:05:50.5653461Z:Activity stopped. OperationName = '{0}', Id = '{1}'.{Microsoft.AspNetCore.Hosting.HttpRequestIn}{00-520b840e8a29c2bbfeb2d7aba078a8f8-02e485ab652cd3b9-01}
2022-02-23T12:05:53.9180330Z:Activity started. OperationName = '{0}', Id = '{1}'.{System.Net.Http.HttpRequestOut}{00-39cfeebdeab214fea318127da2d66121-43870a66b1839c4d-00}
2022-02-23T12:05:53.9204102Z:Activity stopped. OperationName = '{0}', Id = '{1}'.{System.Net.Http.HttpRequestOut}{00-39cfeebdeab214fea318127da2d66121-43870a66b1839c4d-00}

In short: Logs about OpenTelemetry.SqlClient.Execute are missing when running non locally but SQL query is executed (data is reported back).

In addition to that I printed out the diagnostic listener event names. They match the names defined in https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/src/OpenTelemetry.Instrumentation.SqlClient. This means those events are fired...

Question: How can I get more information about what might be failing in SQL client instrumentation?

Describe your environment.

cijothomas commented 2 years ago

Could you share the name and version of the SQLClient package? there were some versions which broke telemetry, so its possible that in your prod version is different?

Sam13 commented 2 years ago

Hello @cijothomas

Thanks for your reply. I'm using Microsoft.Data.SqlClient version 3.0.1.

Meanwhile I found some more details by creating a custom build of SQL instrumentation with some logging. I turns out that the activity is never stopped. The handler always returns because activity source cannot be matched: https://github.com/open-telemetry/opentelemetry-dotnet/blob/f8a1f3caa43f15e4ff1bb747cea972ebd029f9dd/src/OpenTelemetry.Instrumentation.SqlClient/Implementation/SqlClientDiagnosticListener.cs#L145

I dumped the activity source name: Sometimes it's from another instrumentation (e.g. ASP.NET core) sometimes it's empty...

Sam13 commented 2 years ago

An upgrade to Microsoft.Data.SqlClient version 4.1.0 did not resolve the issue. Still no SQL tracing

martinjt commented 3 months ago

This definitely does work now. If this is still an issue, please reopen the issue.