madelson / DistributedLock

A .NET library for distributed synchronization
MIT License
1.75k stars 182 forks source link

Adding Instrumentation #137

Open joshdixon opened 1 year ago

joshdixon commented 1 year ago

Hey,

I'm interested in adding Instrumentation (primarily metrics for now) to track some lock stats. The metrics I'm trying to add at the moment are:

Doing so would require adding System.Diagnostics.DiagnosticSource as a package dependency (which I don't see as being an issue, but let me know otherwise).

I had a crack at adding the instrumentation deep down in DistributedLock.Core as a static class: image

Then calling it from DistributedLockHelpers in the TryAcquireAsync method: image

However, there's some issues with this, such as metrics being skipped if a lock is manually created outside of a provider. Another thing I couldn't quite figure out would be how to cleanly determine when a lock is disposed at this point, as the CancellationToken on the lock handle isn't a good way to determine when a lock has been disposed.

A nice thing to have would be developer configurable Instrument names, instead of them being hard-coded. MassTransit achieves this doing the following:

If attempting to do something similar in this library I feel this would need to be exposed in each of the implementation specific packages at the lock constructor level, similar to the connection builders.

Have you got any ideas of how (or if I should) to go about adding this functionality. Also happy to write up docs of how to collect the metrics via OpenTelemetry after I and/or anyone else implements this functionality.

Some relevant docs on Instrumentation: https://docs.microsoft.com/en-us/dotnet/core/diagnostics/metrics-instrumentation

Cheers.

madelson commented 1 year ago

@joshdixon thanks for filing. This seems like a cool idea and something I'd be supportive of adding if we can find a good design. Therefore, I think it makes sense to start by fleshing out the design side. If you're excited about contributing it that would be great!

Some general questions/thoughts I have to kick off the discussion:

As we hash some of these design questions out, we can also try to build towards a concrete API proposal specifying all the new surface area to be added.

Also happy to write up docs of how to collect the metrics via OpenTelemetry after I and/or anyone else implements this functionality.

This would be great!

joshdixon commented 1 year ago

@madelson, awesome to have you on board!

I tried to get it working with minimal changes prior to making the issue but it doesn't seem like that'll be feasible whilst keeping the code clean and achieving all my goals. So if we're gonna make more significant changes, then yeah working out the best design is a good path to go down.

I mentioned tags a lot, to summarise I think it'd be good to have the following tags:

madelson commented 1 year ago

I don't have much experience using the other kind of primitives though.

Don't worry there is not a lot of difference; if we can solve for locks we can easily extend to all the primitive types.

The ASP.NET instrumentation doesn't allow you to change the instrument names as far as I'm aware.

Feels like a good enough reason not to do it here either.

I'm not sure where this would be configured though, since as you said there's currently no global state.

I can see at least 3 options:

  1. Add this in a consistent way to the Options builder for each implementation/provider (we can use an internal interface to force consistency)
  2. Use an AppContext switch (global state). I wouldn't go with this unless there is a lot of prior art to suggest it is an idiom.
  3. Make this a fully-external wrapping layer, for example:
    var provider = new InstrumentedDistributedSynchronizationProvider(new SqlDistributedSynchronizationProvider(...));

    The latter would be easiest to implement; we wouldn't have to make changes to the individual implementations at all. OTOH, this forces you to work solely through the interfaces rather than coding against specific implementations which is an issue if we add implementation-specific methods in the future. This would also make a bit more difficult to easily turn instrumentation on/off via a configuration setting. Thoughts? Are there other examples we can take from popular libraries?

Could leverage that the handles implement IDisposable and hook into that for some of the metrics.

Yes. You might have seen that most of the implementations follow an "inner handle" pattern where the public handle wraps another handle object. The ones that don't could easily be changed to follow this pattern. Then, the instrumentation handle could just be another (optional) internal wrapping layer so you have PublicHandle(InstrumentationHandle(InnerHandle)).

I mentioned tags a lot

Agreed. I hadn't appreciated how tags make for a really nice solution to this problem. Some thoughts on the ones you've laid out: