Etcd monitoring - Githubissues

raintank / legacy-kubernetes-app

Grafana App for Kubernetes

Apache License 2.0

77 stars 12 forks source link

Etcd monitoring #9

Open daniellee opened 7 years ago

daniellee commented 7 years ago

Every k8s cluster has an etcd cluster. The important thing to measure is latency, especially for the leader. There are a couple of resources for this:

snap collector plugin for etcd - https://github.com/intelsdi-x/snap-plugin-collector-etcd
etcd metrics endpoint - https://coreos.com/etcd/docs/latest/metrics.html
datadog etcd integration as an example of what's possible - https://www.datadoghq.com/blog/monitor-etcd-performance/
etcd dashboard for prometheus on gnet - https://grafana.net/dashboards/178

daniellee commented 7 years ago

Leader stats: /v2/stats/leader Follower stats with /v2/stats/self

Example leader stats:

{
  "leader": "54be60531c7f6892",
  "followers": {
    "3e0d82ced9501c94": {
      "latency": {
        "current": 0.00397,
        "average": 0.22196975104523,
        "standardDeviation": 8.8368979766015,
        "minimum": 0.00124,
        "maximum": 446.408304
      },
      "counts": {
        "fail": 93,
        "success": 2631
      }
    },
    "8697515a5e606ffe": {
      "latency": {
        "current": 0.00322,
        "average": 0.0069664165131983,
        "standardDeviation": 0.012247814765413,
        "minimum": 0.00139,
        "maximum": 0.2215
      },
      "counts": {
        "fail": 0,
        "success": 3258
      }
    }
  }
}

https://coreos.com/etcd/docs/latest/api.html#leader-statistics

woodsaj commented 7 years ago

This will need some thought, i wouldnt worry about it for the initial release of the kubernetes-app.

Right now, snap is deployed to every node, and every snap instance runs the same task(s). For monitoring specific services (etcd, kube-api, elasticsearch, etc....) we would not need every snap instance to perform the checks. I am not sure the best way to tackle this, but it is more aligned with our long term strategy to have g.net be the repository for snap collector plugins and task manifests and associated dashboards.

daniellee commented 7 years ago

Monitoring etcd with Prometheus blog post: https://coreos.com/blog/developing-prometheus-alerts-for-etcd.html

Vince-Cercury commented 7 years ago

has anyone built a etcd grafana dashboard recently? The 178 is mostly out of date. I've tried updated the metrics names but not getting much input from this dashboard.

Any thoughts on how an etcd dashboard should look like, the key metrics to display?

Vince-Cercury commented 7 years ago

Went ahead and built one based on the CoreOs etcd doc (https://coreos.com/etcd/docs/latest/metrics.html)

Available here: https://grafana.com/dashboards/3070

Also available on github https://github.com/VinceMD/Grafana-Dashboards/blob/master/etcd-prometheus-dashboard.json