Motivation for changing prometheus.SummaryVec to prometheus.HistogramVec:
prometheus.HistogramVec can be aggregated, which allows knowing the latency calling a specific node as observed by every other node in the cluster histogram_quantile(0.9, sum by (type, le) (rate(kubenurse_request_duration_bucket{type=~"path_ip-.*"}[5m]))) while this is not possible with prometheus.SummaryVec
histogram calculates quantile on the server side, which has 2 positive effects in large clusters.
you can form 0.9 or higher quantile with a lower number of observations per pod KUBENURSE_CHECK_INTERVAL, since the observations made by all nodes can be aggregated together
cheaper observations made by the clients as they do not need to calculate quantiles
Motivation for changing
prometheus.SummaryVec
toprometheus.HistogramVec
:prometheus.HistogramVec
can be aggregated, which allows knowing the latency calling a specific node as observed by every other node in the clusterhistogram_quantile(0.9, sum by (type, le) (rate(kubenurse_request_duration_bucket{type=~"path_ip-.*"}[5m])))
while this is not possible withprometheus.SummaryVec
KUBENURSE_CHECK_INTERVAL
, since the observations made by all nodes can be aggregated togetherfor more details