kubernetes / kube-state-metrics

Add-on agent to generate and expose cluster-level metrics.
https://kubernetes.io/docs/concepts/cluster-administration/kube-state-metrics/
Apache License 2.0
5.37k stars 2k forks source link

Grafana dashboard #123

Closed cemo closed 6 years ago

cemo commented 7 years ago

I think that it might be really cool to have a dashboard using this project.

andrewhowdencom commented 7 years ago

I am currently integrating it with the kubelet metrics to do like, comparisons of allocated resource usage (request) vs actual resource usage. It was super educational! I got stuff way wrong.

Happy to contrib to some sort of shared dashboard, maybe in a /contrib/grafana/dashboard.json if we can think of some sort of usable metrics from this.

Maybe things like current pods vs desired? (I think this can work that out)

andyxning commented 7 years ago

@andrewhowdencom Another useful monitor info about nodes is the node in NotReady and SchedulingDisabled status. With event alarm(if you'd like, take a look at eventarbiter. Disclosure: eventarbiter is written by me. :)), we can only get one event when node changing to NotReady or SchedulingDisabled. If we miss those events we can hardly know which node is in those status.

It should be help to point out which nodes are in SchedulingDisabled and NotReady status. :)

brancz commented 7 years ago

Actually dashboards and alerts would be even better, but let's do one at a time :wink:

andrewhowdencom commented 7 years ago

I currently have no plans to implement this immediately, as I have no idea what I'd implement. But loosely,

1 Dashboard (templated with node name) that displays all of those metrics; it looks like the'd be mostly singlestat except for a couple which are binary (and thus could be set to "red" in the error condition)

  1. Dashboard (templated with DaemonSet/Deployment/RS name) a. Display of number scheduled vs number ready. Red when that 2 < 1 b. Singlestats of all the rest
  2. Dashboard (templates with pod name): Singlestat for most of that
  3. Dashboard (Templated with quota name, namespace): singlestats, split by type. Resources all in the same dashboard

This is a suuuper rough spec. If someone would like me to look at this, like this comment. I have something allocated for the next "improvement" period, but the one after (5 weeks) I could do this.

chlunde commented 7 years ago

Maybe just start with example queries and alerts? Would enabling the wiki be a good idea?

andyxning commented 7 years ago

@chlunde Would wiki allowed to be contributed by the community users? We could not enumerate most possible usable example queries and alerts without community users.

I mean we need a way for community users to participate in.

chlunde commented 7 years ago

@andyxning Just some "getting started" queries would be a good idea, it does not have to be complete. A "Recommended set of alerts" would also be nice. More realistic might be just a dump of queries with descriptions. :worried:

https://help.github.com/articles/changing-access-permissions-for-wikis/ I don't know how they define "collaborator". I have no experience with github wikis and if relaxed permissions would be an issue (spam etc).

andyxning commented 7 years ago

@chlunde Thanks for the reference to the wiki settings for Github. Will check this later.

andyxning commented 7 years ago

Actually, I can only write a wiki and can not set the access permissions for wikis. :( .

So, @chlunde @brancz How about we add an examples directory under the topic level directory of the repo. Thus everyone can send a PR to add a query or alert example.

The directory placement can be:

examples
|---queries
       |---README.md
       |---queries.md
|---alerts
       |---README.md
       |---alerts.md
brancz commented 7 years ago

I'd prefer the examples versioned along with the code, that way we can ensure the correctness over the course of changes.

andyxning commented 7 years ago

@chlunde @andrewhowdencom @cemo Would you guys give it a first shot? :)

fiunchinho commented 7 years ago

I guess some examples are https://grafana.com/dashboards/741 and https://grafana.com/dashboards/747

brancz commented 7 years ago

Those show exactly the problem I described, they're great dashboards, but useless if you don't have all the metric sources required - so it's not kube-state-metrics specific and if we open that box I fear it will be a sink for endless bike-shedding.

thuandt commented 6 years ago

After deploy Prometheus for monitoring my k8s cluster, in Grafana i can't find any good dashboard compatible with kube-state-metrics.

Official one is really helpful

antoniaklja commented 6 years ago

what do you think about adding dashboard for Quality of Service? I mean utilized resources vs resource quota

brancz commented 6 years ago

I'm not opposed to dashboards that only use metrics from kube-state-metrics.

andyxning commented 6 years ago

@antoniaklja Sorry for not getting your points. Can you put an example about utilized resources vs resource quota.

andrewhowdencom commented 6 years ago

i interpret that to mean something like:

kube_pod_container_resource_requests_cpu_cores{pod="$pod_name"} and sum by (container_name)( rate(container_cpu_usage_seconds_total{pod_name="$pod_name"}[30s] ) ) on the same graph. We use it to check whether what's being allocated is what's being used.

Doesn't follow the rule "only use kube-state-metrics"

brancz commented 6 years ago

Doesn't follow the rule "only use kube-state-metrics"

Exactly. My point is, that dashboards like that would not work unless people actually also collect cAdvisor metrics (in that specific case), which is rather opinionated. I believe what would be more useful would be example queries, like the grpc-go-prometheus project does.

andrewhowdencom commented 6 years ago

I believe what would be more useful would be example queries, like the grpc-go-prometheus project does.

This seems like an excellent compromise. However, where would we put such queries? At this point they're broad "Kubernetes" queries; maybe here:

https://github.com/kubernetes/contrib/tree/master/prometheus

?

Edited: s/metrics/queries/. I need another coffee.

antoniaklja commented 6 years ago

@andyxning I was thinking about dashboard which shows:

but looks like it breaks your assumption to only use metrics from kube-state-metrics

brancz commented 6 years ago

I'm happy with having a new documentation page alongside the documentation of the metrics in the Documentation/ directory. I like the name the grpc-go-prometheus projenct chose "Useful Example Queries", but have no strong opinion and am open to any suggestion.

If they're broad enough, I think we should rather work on getting them on the rendered kubernetes.io documentation, I'm not a fan of the "contrib", "catch-all" repositories, they are very prone to go out of date and unmaintained.

StianOvrevage commented 6 years ago

Just wanna add my thoughts here.

Right now there is an official Prometheus Helm Chart ( https://github.com/kubernetes/charts/tree/master/stable/prometheus ) which uses Prometheus node-exporter ( https://github.com/prometheus/node_exporter ) and kube-state-metrics.

I would think a substantial number of kube-state-metrics + Prometheus users would deploy using the Prometheus Helm Chart. These are for the low-touch installations when getting started, before adding loads of customizations and special conditions.

From reading this thread I sense the following train of thought:

My POV is that having a Grafana dashboard that combined metrics from kube-state-metric and node-exporter would be very, very helpful to all of us who just deploys the Prometheus Helm Chart to get going quickly. I have now tried 10-12 of the Kubernetes dashboards on Grafana.com, and they are all broken in some way or another.

This will of course not suit every use case, but be a very good starting point for many (including me).

brancz commented 6 years ago

As such a chart actually combines the two dependencies it seems like the correct place to host a dashboard definition that relies on both.

As a side note, the author of that chart has since moved on and packaged the kube-prometheus stack, and uses that in production (I know this because I'm in close contact with him).

This is my personal opinion as a maintainer of all projects involved: that chart is not well architectured and is always outdated in both versions and paradigms. I do not recommend using it (but this is my opinion, you should make your own). This is partly the reason why I have started contributing to the official kube-state-metrics chart.

StianOvrevage commented 6 years ago

Ok, cool. I see they don't have a Helm chart yet. But I will probably switch when they do. In the meantime I can just grab the Dashboards from that project and contribute/improve them there.

Just for clarification, I assume you mean the chart that is outdated and badly architectured is the one at stable/prometheus, and not kube-prometheus?

brancz commented 6 years ago

There’s is actually a chart, which has potential for improvement (but multiple people are working on it). Unless you want to deploy on production today I recommend using and contributing to them. For whatever reason the helm charts are located in the root of the repo. The fact that at least some of the charts are “meta” charts makes it in our opinion unfit for the official charts repo (at least until each component has its own individual chart), which is why it’s hosted separately.

I don’t want to seem to negative, but before using helm I highly encourage looking new at other solutions like ksonnet, which in my opinion is how we should be doing manifest management.

StianOvrevage commented 6 years ago

Mkay, I'll have a look at the chart.

What are the chances something like ksonnet will out compete Helm and become the de-facto standard? I really hate fragmented environments (like Linux distros) where everybody has their own unique twist but causes tons of duplication of work, incompatibilities and headaches for marginal gains :\ That said, I don't know enough about Helm vs ksonnet to judge them on a technical basis. But with MS backing, I would think (and hope) that Helm and charts will be around for some time :)

brancz commented 6 years ago

Helm is probably here to stay there is too much involvement by too many companies that it will vanish. That said, I do strongly believe ksonnet or at least something jsonnet based will eventually take over. Important to understand is that they’re not solving the exact same problem. For example ksonnet is merely there to generate/transform yaml/json, whereas helm has tiller and manages releases etc., ksonnet does the job it it supposed to do pretty well (but it ends there, no release managment), and that’s the part Helm has been particularly unflexible about.

metalmatze commented 6 years ago

Just to summarize the current state, the alerts in kube-prometheus are just your (@brancz) opinionated examples, but there none from kube-state-metrics itself?

fejta-bot commented 6 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 6 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten /remove-lifecycle stale

fejta-bot commented 6 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

icy commented 4 years ago

Does anyone have a dashboard for kube-state-metrics? Thanks a lot.

icy commented 4 years ago

/reopen

k8s-ci-robot commented 4 years ago

@icy: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/kubernetes/kube-state-metrics/issues/123#issuecomment-543820298): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
metalmatze commented 4 years ago

@icy, the kubernetes-mixin project and the kube-prometheus (which uses the first one) have dashboards which make heavy use of kube-state-metrics.