postfinance / kubenurse

Kubernetes network monitoring
MIT License
416 stars 39 forks source link

migrate to histogram #58

Closed myaser closed 1 year ago

myaser commented 2 years ago

Motivation for changing prometheus.SummaryVec to prometheus.HistogramVec:

  1. prometheus.HistogramVec can be aggregated, which allows knowing the latency calling a specific node as observed by every other node in the cluster histogram_quantile(0.9, sum by (type, le) (rate(kubenurse_request_duration_bucket{type=~"path_ip-.*"}[5m]))) while this is not possible with prometheus.SummaryVec
  2. histogram calculates quantile on the server side, which has 2 positive effects in large clusters.
    1. you can form 0.9 or higher quantile with a lower number of observations per pod KUBENURSE_CHECK_INTERVAL, since the observations made by all nodes can be aggregated together
    2. cheaper observations made by the clients as they do not need to calculate quantiles

for more details

coveralls commented 2 years ago

Pull Request Test Coverage Report for Build 3410042374


Totals Coverage Status
Change from base Build 3368223944: -0.04%
Covered Lines: 435
Relevant Lines: 560

💛 - Coveralls
djboris9 commented 1 year ago

lgtm. @zbindenren how about you?

zbindenren commented 1 year ago

Thanks for your contribution.