linkedin / ambry

Distributed object store
https://github.com/linkedin/ambry/wiki
Apache License 2.0
1.74k stars 275 forks source link

Metrics for the count of partitions where there are 1, 2, or 3 local replicas down #2802

Closed Arun-LinkedIn closed 2 months ago

Arun-LinkedIn commented 2 months ago

We are adding two types of metrics. One includes 'bootstrap/inactive' replicas as 'down' replicas while other doesn't.

codecov-commenter commented 2 months ago

Codecov Report

Attention: Patch coverage is 33.33333% with 46 lines in your changes missing coverage. Please review.

Project coverage is 70.00%. Comparing base (52ba813) to head (fa5f272). Report is 26 commits behind head on master.

Files Patch % Lines
...b/ambry/clustermap/HelixClusterManagerMetrics.java 33.82% 45 Missing :warning:
...m/github/ambry/clustermap/HelixClusterManager.java 0.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #2802 +/- ## ============================================ + Coverage 64.24% 70.00% +5.76% - Complexity 10398 11696 +1298 ============================================ Files 840 842 +2 Lines 71755 72245 +490 Branches 8611 8696 +85 ============================================ + Hits 46099 50578 +4479 + Misses 23004 19023 -3981 + Partials 2652 2644 -8 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

justinlin-linkedin commented 2 months ago

I am not sure about the speed of iterating through 35K partitions every minute to check out its state, can we put some trace log in the Gauge method to print out how long it takes to do that. And if this take a long time, they are some ways we can improve the metric collection.

  1. Do this every 10 minutes, as we don't need it real time. We can have a lastUpdatedTimestamp in the HelixClusterManagerMetric so we only have to recalculate every 10 minutes.
  2. Do it in the HelixClusterManager in the ExternalViewCallback. We will only calculate whenever there is an update in ExternalView.