Split control-plane vs non-control plane alerts

brancz commented 6 years ago

We include the kubernetes-mixin for monitoring in the kube-prometheus stack, and a common point of frustration is that all alerts are always shipped, even on Kubernetes clusters that are managed like GKE or AKS. For those clusters it is often not possible to retrieve the metrics necessary to monitor the control plane components.

While it would be possible to hand pick or filter alerts, my feeling is that it could be beneficial to split alerts into the two groups also for a world, where a single Prometheus server is not sufficient to monitor an entire cluster, or in multi-tenant Kubernetes environments. In these scenarios we are seeing people assign a Prometheus server per tenant (typically made up of one or more namespaces), and the responsibility of that tenant is not to monitor the Kubernetes cluster itself, but primarily the workload.

This would not be a breaking change, as the entrypoint (as in the .libsonnet file imported by people) for the alerting rules would stay the same.

@tomwilkie @metalmatze

metalmatze commented 6 years ago

I'm all for it. Running clusters on GKE and Kubermatic, I've already experienced this and simply silenced the control plane alerts for 1 year.

karlskewes commented 5 years ago

Maybe we can apply the same grouping include/exclude to the Service Monitors here?: https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus/jsonnet/kube-prometheus/prometheus/prometheus.libsonnet#L221

kubernetes-monitoring / kubernetes-mixin

Split control-plane vs non-control plane alerts #50