Open lilic opened 4 years ago
Super curious about some details, but I think adding this is an amazing idea. It's baseline monitoring that everyone should have.
@brancz what kind of details? Maybe I can answer or explore of them?
I'm a little curious about the cardinality of this, it looks like this would end up being O(n^2) series, as each host reports on each host. That could get expensive quickly, let's say with 10k nodes.
can goldpinger split nodes into zones?
goldpinger is a Debugging tool for Kubernetes which tests and displays connectivity between nodes in the cluster. It also provides metrics out of the box, which is why it would be nice to integrate into kube-prometheus.
It already has the grafana dashboard. They also seem to be open to contributions around their alerts. <3
I tried it out on a multi node local Kubernetes cluster and all metrics I got are at the bottom of this issue. I found the
goldpinger_nodes_health_total
the most useful here, as well as thegoldpinger_peers_response_time_s_
histogram which is a "Histogram of response times from other hosts, when making peer calls". The second one might be interesting for any SLOs we might want to do around nodes.Metrics dump: