prometheus-community / helm-charts

Prometheus community Helm charts
Apache License 2.0
5.12k stars 5.02k forks source link

[prometheus-kube-stack] kube-etcd, kube-scheduler & kube-controller-manager alerts firing #4924

Open chrede88 opened 4 weeks ago

chrede88 commented 4 weeks ago

Describe the bug a clear and concise description of what the bug is.

The TargetDown and etcdInsufficientMembers alerts are firing even though my cluster is running perfectly fine. My guess is that prometheus can't find the resources somehow.

I've tried to set the endpoints for the services and using a label selector for the serviceMonitor.

What's your helm version?

version.BuildInfo{Version:"v3.16.2", GitCommit:"13654a52f7c70a143b1dd51416d633e1071faffb", GitTreeState:"dirty", GoVersion:"go1.23.2"}

What's your kubectl version?

Client Version: v1.31.1, Kustomize Version: v5.4.2, Server Version: v1.31.1

Which chart?

kube-prometheus-stack

What's the chart version?

65.3.1

What happened?

kube-etcd, kube-scheduler & kube-controller-manager alerts are firing. My cluster is fine, so they shouldn't be.

What you expected to happen?

No alerts firing

How to reproduce it?

Using any of the two snippets from my values.yaml defined below.

Enter the changed values of values.yaml?

Current values:

kubeScheduler:
  service:
    selector:
      k8s-app: kube-scheduler

kubeControllerManager: &kubeControllerManager
  service:
    selector:
      k8s-app: kube-controller-manager

kubeEtcd:
  <<: *kubeControllerManager # etcd runs on control plane nodes

I've also tried this:

kubeControllerManager:
  enabled: true
  endpoints: &controlplane
    - 10.10.30.2
    - 10.10.30.3
    - 10.10.30.4

kubeEtcd:
  enabled: true
  endpoints: *controlplane

kubeScheduler:
  enabled: true
  endpoints: *controlplane

Enter the command that you execute and failing/misfunctioning.

None

Anything else we need to know?

I'm running Talos Linux v1.8.1