Open springmin opened 6 months ago
I've been thinking about this for a while (and implementing the server using Generic Host in general).
By using the new System.Diagnostics.Metrics APIs (Counter<T>
, UpDownCounter<T>
, Histogram<T>
etc. ..) would allow us to delete HdrHistogram fork altogether from the repo and simplify our metrics/monitoring logic a lot. The places where we pull the metrics inside the server (LATENCY
etc?) we should be able to use simple MeterListener
to aggregate them). I'm not sure if Histogram<T>
covers everything that HdrHistogram
offers and someone who knows statistics better than me should weigh in on that (https://github.com/dotnet/runtime/issues/63650 seems to be useful addition).
We should follow the existing OTel semantic conventions for Redis but of course modify/extend it if needed. Also configurations knob whether to record db.statement or not, as I believe there is currently no redaction capabilities in Garnet for potentially sensitive parts of the command.
Measuring anything should become a lot easier, but we should follow how ASP.NET & System.Net
guards their metrics logic to keep the overhead next to none IF the metrics are disabled (Instrument.Enabled
). Should the metrics codepaths should be entirely eliminated when Garnet is built with <MetricsSupport>false</MetricsSupport>
given that would also affect the behavior of the RESP commands that rely on it, if they are implemented using MeterListener
?
We should also measure if any extra contention (IIRC currently Garnet's counters are done by atomic Interlocked.Increment
s) for normal use-cases. We should keep eye on how big difference there is in RPS with, for example, dotnet-counters
hooked.
Some existing tracking issues for S.D.Metrics
performance concerns https://github.com/dotnet/aspnetcore/issues/50412, https://github.com/dotnet/runtime/issues/71563 to keep eye on too.
(I've been thinking about this for quite bit because I'm planning to write Crank infrastructure & scenarios to bench Garnet, someday.)
It would be a great idea to support OpenTelemetry on the server side, and contributions in this space are highly encouraged! Not only is this generally useful, but when used with .NET Aspire it will make the metrics automatically available. Doing this seems to basically require some glue code to feed data from our current Metrics API endpoint to OpenTelemetry:
The potential migration to System.Diagnostics.Metrics
is orthogonal though. It needs to be considered with care:
We should also measure if any extra contention (IIRC currently Garnet's counters are done by atomic Interlocked.Increments)
HdrHistogram
is very efficient, and was better than the Metrics
API last we checked. To avoid contention, we keep per-session metrics so that they can be updated without requiring atomics such as Interlocked.Increment
. In the critical path, it is just a simple add on the session: https://github.com/microsoft/garnet/blob/8856dc3990fb0863141cb902bbf64c13202d5f85/metrics/HdrHistogram/LongHistogram.cs#L153-L154
Then we aggregate periodically (or on-demand) across all active and inactive sessions to perform the metrics computation. HdrHistogram
is really efficient on the record metrics path and does the expensive percentile computations off the critical path.
the project is being compiled to both net6 & net8, the new metrics API's exist starting with net8
We are removing support for .NET 6 here: https://github.com/microsoft/garnet/pull/580
Hope to see a contribution in this space!
.net6 support now removed, in #582
Feature request type
enhancement
Is your feature request related to a problem? Please describe
integrate OpenTelemetry or other trance component to enhance observability。
Describe the solution you'd like
integrate OpenTelemetry or other trance component to enhance observability。
Describe alternatives you've considered
No response
Additional context
No response