kubernetes-monitoring / kubernetes-mixin

A set of Grafana dashboards and Prometheus alerts for Kubernetes.
Apache License 2.0
2.1k stars 599 forks source link

Remove kube-state-metrics labels form Kubernetes workload alerts #37

Open brancz opened 6 years ago

brancz commented 6 years ago

It is often confusing for users, when there are alerts about Kubernetes workloads (deployments, daemonsets, statefulsets, etc) and it seems at first sight that it is coming from the kube-state-metrics target. We should probably drop any labels that identify kube-state-metrics and just leave the actual contextual information like the object name and namespace.

My hunch is that this would need to be configurable. I understand that for example in the Kausal ksonnet-prometheus package this would be the instance label, but in most other setups out there (such as the default Prometheus configuration from the Prometheus repo and the Prometheus Operator) this will be labels with the respective Kubernetes resource in it (pod/service/namespace/etc). It's also reasonable that people can do this however they like.

@tomwilkie @metalmatze

tomwilkie commented 6 years ago

I agree; but isn't this up to the scape config? ie out of scope for the mixin?

The mixin should only depend on labels from KSM for sure.

tomwilkie commented 6 years ago

Flip side of this is the "rule" that alerts should at least specify a job name (or at least, I recall that rule but can't find a reference to it). In this case do we have to have a job name, or is the KSM metric prefix (kube_) distinct enough?

brancz commented 6 years ago

I think we are aligned, just want to give an example to make 100% clear what I mean. For example a possible alert you may receive today:

{
    "alert": "KubeDeploymentGenerationMismatch",
    "instance": "<ip:8080>",
    "job": "kube-state-metrics",
    "service": "kube-state-metrics",
    "namespace": "default",
    "deployment": "my-app",
}

Because all services are configured/relabeled/scraped the same way, and all services get the service label relabeled, this causes problems here. The instance label would not be super useful either, as it's about an abstract Kubernetes thing, rather than kube-state-metrics, so an instance is not really useful (kube-state-metrics has a separate port for metrics about itself). job being kube-state-metrics may still be applicable, but even that people may find confusing.

lilic commented 5 years ago

@brancz I guess you already had an issue for this open? :)

brancz commented 5 years ago

Let me update myself here because I don't know what I was thinking when I wrote the last comments.

What we would like to do is not have the job selector whenever using kube-state-metrics metrics in alerts, except the KubeStateMetricsDown alert. The reason for this is: we perform metric relabeling on kube-state-metrics metrics dropping the job label, as that way there are no confusing label on these metrics, but it retains up monitoring for the kube-state-metrics endpoint. I think this is ok, as the metrics are not about a job anyways, they are in some way meta metrics or application metrics.

@csmarchbanks @metalmatze @gouthamve @tomwilkie does this sound ok with you? (forget about everything I said in the previous comments)

csmarchbanks commented 5 years ago

That seems reasonable to me, though does any work need to be done? If not using the job label for kube-state-metrics couldn't you set kubeStateMetricsSelector: '' in $._config?

brancz commented 5 years ago

You're totally right, there's nothing really to do for that. I was thinking about the KubeStateMetricsDown alert. But it turns out that's actually defined in our downstream usage, not here. We can just set the job selector there "manually" and set the job label to empty for the kubernetes-mixin config.