open-telemetry / opentelemetry-dotnet-contrib

This repository contains set of components extending functionality of the OpenTelemetry .NET SDK. Instrumentation libraries, exporters, and other components can find their home here.
https://opentelemetry.io
Apache License 2.0
477 stars 283 forks source link

Changing Microsoft.Data.SqlClient diagnostic format #1732

Open Wraith2 opened 1 year ago

Wraith2 commented 1 year ago

I contribute to Microsoft.Data.SqlClient and one of the recent issues we've had opened is to do with making the diagnostic information trim safe https://github.com/dotnet/SqlClient/issues/2184

Changing from using anonymous types to strong types is mostly just writing the types and getting it built which isn't too hard. I'm basing my changes on ones done for aspnet https://github.com/dotnet/aspnetcore/pull/11730 so they have public get-only properties and directly implemented enumerators, they are reflection free.

I am wondering how upstream callers like OTel will need to change (if at all) to consume the new classes and whether there is something I may be able to build into the new classes to assist or enable that. I'm also wondering if the order of properties or pairs is important and if changing them may cause problems. Are there things I can do to make life easier for consumers? Are there things to avoid?

cijothomas commented 1 year ago

Reached out to @mbakalov to see if he can do an initial triage of the issue!

mbakalov commented 1 year ago

Always happy to have a dependency declare something public contract. 😄

If we can get rid of reflection in our SqlClientDiagnosticListener than that would be great!

We can come up with and discuss the minimal set of fields we'd like strongly typed/stable to fullfill OTel's semantic conventions for logging database things.

Couple questions in my mind:

  1. We would take a dependency on Microsoft.Data.SqlClient then? It seems ok conceptually (we're providing instrumentation for it!) but need to figure out how to correctly support apps on older versions of it.

  2. There is also a netfx version of SqlClient that doesn't use DiagnosticSource. Could probably be out of scope for now?

  3. The discussion in the mentioned issue about SqlClient potentially emitting Activities directly. Imo this is the best long-term path - Activities are now "native" in dotnet, for libraries to emit them natively (with standard naming conventions) feels like the next logical step and aligns with where the industry seems to be heading?

Wraith2 commented 1 year ago
  • We would take a dependency on Microsoft.Data.SqlClient then? It seems ok conceptually (we're providing instrumentation for it!) but need to figure out how to correctly support apps on older versions of it.

That's up to you. Whatever works best for you as a consumer is fine. The primary reason for making this change from the SqlClient library side is to formalize the surface area so we're moving towards trim safety. The secondary reason is that we've had issues opened to directly support metrics and tracing, those should also have formal surface area in my opinion.

  • There is also a netfx version of SqlClient that doesn't use DiagnosticSource. Could probably be out of scope for now?

This decision is down to how popular opentelemetry is on netfx. I wouldn't imagine it's that big of a concern for legacy projects but I could be wrong. I'd call it out of scope. That means you are likely to need the reflection based listener that is currently used for old netcore versions and for all netfx versions.

Detecting if the loaded version is supplying strongly typed events and switching in the strongly typed listener/tracer seems likely to be the best way to deal with that but we'll need to agree a contract of some kind so you can easily detect the new library versions sensibly.

  • The discussion in the mentioned issue about SqlClient potentially emitting Activities directly. Imo this is the best long-term path - Activities are now "native" in dotnet, for libraries to emit them natively (with standard naming conventions) feels like the next logical step and aligns with where the industry seems to be heading?

Works for me. As i mentioned i've been using ben's approach from mvc as so i'm adding classes such as:

    public sealed class SqlClientConnectionOpenError : SqlClientDiagnostic
    {
        public Guid OperationId { get; }
        public string Operation { get; }
        public long Timestamp { get; }

        public Guid ConnectionId { get; }
        public SqlConnection Connection { get; }
        public string ClientVersion { get; }
        public Exception Exception { get; }
       ...

full file which explicity support IReadOnlyCollection<KeyValuePair<string,object>> where currently you just get an anonymous object.

I expect this to require a bit of back and forth and I hope that we'll be able to take this work and build on it to add metrics and tracing in the same way. So really what would be best is if you tell me what your ideal supporting library behaviour looks like I'll see if I can implement that and then if it turns out that there's a problem we'll have to engineer a way around it.

mbakalov commented 1 year ago

(made some edits)

That WIP API looks good to me! I see the property names on the new strong-typed objects are the same as they were on the anonymous types - this is good, the OTel instrumentation should continue to "just work". AppInsights does the same thing as OTel afaik, and should also work. I can help test both.

One potential suggestion, that I am not sure we should pursue, is related to us getting rid of reflection. It seems that the preferred pattern for consumers of DiagnosticSource is to use reflection to avoid taking a compile-time dependency on the producer. There are old arguments in issues linked from Ben's PR, which I don't fully follow, but even the EFCore OTel instrumentation has chosen to use reflection despite EFCore's payloads being strongly typed.

So assuming OTel also won't take a dependency on SqlClient (still not sure), then our path to avoiding reflection would be if you exposed a few top-level fields to us as strings/known types to grab from the payload directly:

What do you think? It does feel too ad-hoc to duplicate/flatten something in the payload, so not sure if worth it and if makes sense for you, but would probably save us some perf.

Whenever you have a working nuget I am happy to play around with it and test out different scenarios with OTel and AppInsights!

Wraith2 commented 1 year ago

At the moment what I'm working on is translating the unnamed objects in the SqlClient library to be strongly typed. That will allow us to be trim compatible but doesn't confer any real advantage to users of those objects unless they take a strong dependency.

Considering it from the OTel perspective it would be very complex to try and take a dependency so using it through reflection is probably the best approach currently. However, the strong typed implementation specifically implements IReadOnlyCollection<KeyValuePair<string,object>>> so reflection isn't required and if you know that a producer is well defined (for example that it only has a small number of keys in that collection) then you can purely use enumeration of pairs to fetch the data. Using only enumeration would allow that specific consumer to be trimmable.

I can look at adding additional members to the currently existing notifications once I've translated them. Note that what we currently emit are pairs of notifications and not activities. We emit a Before and then either an After or an Error but these are all DiagnosticSource.Write calls, not Activity objects. I'm finding working out what is needed for various levels of diagnostics very confusing, I'm not sure if we should be trying to use activities with tags or not.

I'll finish the current conversion preserving the names and types and then once I can build the library I'll open a PR and ping you to try the artifacts for compatibility. If they pass I'll then be able to look at expanding those notifications with feedback from you on what additional information would be useful to get more cleanly from us.

mbakalov commented 1 year ago

Sounds like a plan! Thx!

cijothomas commented 1 year ago

The discussion in the mentioned issue about SqlClient potentially emitting Activities directly. Imo this is the best long-term path - Activities are now "native" in dotnet, for libraries to emit them natively (with standard naming conventions) feels like the next logical step and aligns with where the industry seems to be heading?

This would be the best direction to take. Other libraries like HttpClient, Asp.Net Core are moving in this direction (metrics already done, and tracing coming with .NET 9).

@Wraith2 Would Ms.Data.SqlClient be open to this approach?

cijothomas commented 1 year ago

https://github.com/dotnet/SqlClient/issues/917

cijothomas commented 1 year ago

https://github.com/dotnet/SqlClient/issues/2210 Found the newer issue tracking this work. My suggestion is to do no changes other than implementing tracing natively in sqlclient :)

Wraith2 commented 1 year ago

@Wraith2 Would Ms.Data.SqlClient be open to this approach?

Probably. Note that I'm an external contributor and I don't make any decisions. I've found that in general as long as the additional features are clearly desirable and implemented correctly they SqlClient team are open to any contribution. There are requests for both tracing and metrics from the EF Core team and I don't know if there's any plan to pick them up from the MS side but I can probably get them implemented with some help.

I'm aware of activities and have used them in hobby projects in the last couple of years but they're not something I'm familiar enough with to try and plug them into SqlClient without guidance. We're currently writing events or notifications through the diagnostics infrastructure but we aren't using activities. Metrics are yet another thing on top of that. If I can understand the differences well enough I can implement them.

cijothomas commented 1 year ago

Implementation wise, it'll roughly as follows:

  1. In the place where DS events are fired, start Activity, and populate it tags.
  2. The tags should follow OTel sem. conventions.
  3. Provide an ability to enrich the activity to the users using contextual info. (such a thing needs to be designed, but you can probably wait for httpclient/aspnetcore to do it first, and borrow their approach OR vice-versa).
  4. Make this under feature flag/opt-in only.
  5. SQL does not have to deal with context-propagation part most likely, so I hope this'll be a lot easier than httpclient/aspnetcore.
martinjt commented 6 months ago

From reading up, it seems like we're not planning anything in Otel for this, so we can close this issue right?

/cc @cijothomas @Wraith2

Wraith2 commented 6 months ago

As long as the strongly typed diagnostic objects that I introduced are working through the current adapter code I think this can be closed.

cijothomas commented 6 months ago

From reading up, it seems like we're not planning anything in Otel for this, so we can close this issue right?

/cc @cijothomas @Wraith2

It is not clear yet, if this component will be needed or not. If sqlclient does native instrumentation, then this can be removed, but that is not a confirmed thing from what I can see : https://github.com/dotnet/SqlClient/issues/2210

As long as the strongly typed diagnostic objects that I introduced are working through the current adapter code I think this can be closed.

Unless there is a unit test covering this, we won't know for sure.

Wraith2 commented 6 months ago

in https://github.com/dotnet/SqlClient/pull/2226#issuecomment-1914033445 @mbakalov checked the changes were compatible with otel and app insights. I expect that the sqlclient team are just waiting on major version processes before merging.

Wraith2 commented 5 months ago

https://github.com/dotnet/SqlClient/pull/2226 is merged and will be in the next preview build of SqlClient.