migrate to histogram - Githubissues

myaser commented 2 years ago

Motivation for changing prometheus.SummaryVec to prometheus.HistogramVec:

prometheus.HistogramVec can be aggregated, which allows knowing the latency calling a specific node as observed by every other node in the cluster histogram_quantile(0.9, sum by (type, le) (rate(kubenurse_request_duration_bucket{type=~"path_ip-.*"}[5m]))) while this is not possible with prometheus.SummaryVec
histogram calculates quantile on the server side, which has 2 positive effects in large clusters.
1. you can form 0.9 or higher quantile with a lower number of observations per pod KUBENURSE_CHECK_INTERVAL, since the observations made by all nodes can be aggregated together
2. cheaper observations made by the clients as they do not need to calculate quantiles