Open daniellee opened 7 years ago
Leader stats: /v2/stats/leader
Follower stats with /v2/stats/self
Example leader stats:
{
"leader": "54be60531c7f6892",
"followers": {
"3e0d82ced9501c94": {
"latency": {
"current": 0.00397,
"average": 0.22196975104523,
"standardDeviation": 8.8368979766015,
"minimum": 0.00124,
"maximum": 446.408304
},
"counts": {
"fail": 93,
"success": 2631
}
},
"8697515a5e606ffe": {
"latency": {
"current": 0.00322,
"average": 0.0069664165131983,
"standardDeviation": 0.012247814765413,
"minimum": 0.00139,
"maximum": 0.2215
},
"counts": {
"fail": 0,
"success": 3258
}
}
}
}
https://coreos.com/etcd/docs/latest/api.html#leader-statistics
This will need some thought, i wouldnt worry about it for the initial release of the kubernetes-app.
Right now, snap is deployed to every node, and every snap instance runs the same task(s). For monitoring specific services (etcd, kube-api, elasticsearch, etc....) we would not need every snap instance to perform the checks. I am not sure the best way to tackle this, but it is more aligned with our long term strategy to have g.net be the repository for snap collector plugins and task manifests and associated dashboards.
Monitoring etcd with Prometheus blog post: https://coreos.com/blog/developing-prometheus-alerts-for-etcd.html
has anyone built a etcd grafana dashboard recently? The 178 is mostly out of date. I've tried updated the metrics names but not getting much input from this dashboard.
Any thoughts on how an etcd dashboard should look like, the key metrics to display?
Went ahead and built one based on the CoreOs etcd doc (https://coreos.com/etcd/docs/latest/metrics.html)
Available here: https://grafana.com/dashboards/3070
Also available on github https://github.com/VinceMD/Grafana-Dashboards/blob/master/etcd-prometheus-dashboard.json
Every k8s cluster has an etcd cluster. The important thing to measure is latency, especially for the leader. There are a couple of resources for this: