Is your feature request related to a problem? Please describe
Given the introduction of Request Tracing Framework (RTF) using OpenTelemetry (OTel), metrics (histogram/counter) can now be published and used to track high latency operations. This issue tracks the instrumentation for introducing latency metrics in ClusterManager which can help identify scaling bottlenecks.
The following metrics can be added to start with:
Committing any change in ClusterState involves running Appliers and Listeners, which are supposed to be very light weight operations. Tracking latency metrics for such operations will help in identifying potential bottlenecks which can slow down the ability of ClusterManager to process the pending tasks queue.
Metric to track latency of reroute operation.
Latency while computing new cluster state upon any change and time taken to successfully publish that state to other nodes.
Describe the solution you'd like
OTel Histogram Metrics: Support for Histogram type metrics, which was added as part of #12062, can be utilised to publish the metrics for each use case.
Is your feature request related to a problem? Please describe
Given the introduction of Request Tracing Framework (RTF) using OpenTelemetry (OTel), metrics (histogram/counter) can now be published and used to track high latency operations. This issue tracks the instrumentation for introducing latency metrics in ClusterManager which can help identify scaling bottlenecks.
The following metrics can be added to start with:
Committing any change in ClusterState involves running Appliers and Listeners, which are supposed to be very light weight operations. Tracking latency metrics for such operations will help in identifying potential bottlenecks which can slow down the ability of ClusterManager to process the pending tasks queue.
Metric to track latency of reroute operation.
Latency while computing new cluster state upon any change and time taken to successfully publish that state to other nodes.
Describe the solution you'd like
OTel Histogram Metrics: Support for Histogram type metrics, which was added as part of #12062, can be utilised to publish the metrics for each use case.
Related component
Cluster Manager
Describe alternatives you've considered
No response
Additional context
No response