prometheus-community / helm-charts

Prometheus community Helm charts
Apache License 2.0
5k stars 4.99k forks source link

[prometheus-kube-stack] alertmanager pod is missing #851

Closed iershovnsk closed 3 years ago

iershovnsk commented 3 years ago

Describe the bug I'm launching the whole prometheus stack using helmfile and prometheus-kube-stack chart, and can see everything is ok, but alertmanager pod is completely missing:

$ kubectl get pods -n monitoring
NAME                                                  READY   STATUS    RESTARTS   AGE
monitoring-grafana-667f4cc99b-wkhp6                   2/2     Running   0          13m
monitoring-kube-prometheus-operator-f5f75f765-rgbww   1/1     Running   0          13m
monitoring-kube-state-metrics-6c94f8f974-t96bc        1/1     Running   0          13m
monitoring-prometheus-node-exporter-25vtf             1/1     Running   0          13m
monitoring-prometheus-node-exporter-8mdsk             1/1     Running   0          13m
monitoring-prometheus-node-exporter-ccbhh             1/1     Running   0          13m
monitoring-prometheus-node-exporter-ql84j             1/1     Running   0          13m
monitoring-prometheus-node-exporter-qwbxk             1/1     Running   0          13m
monitoring-prometheus-node-exporter-zl2cb             1/1     Running   0          13m
prometheus-monitoring-kube-prometheus-prometheus-0    2/2     Running   0          13m
prometheus-msteams-76c6df678d-68df9                   1/1     Running   0          14m

Version of Helm and Kubernetes:

Helm Version:

version.BuildInfo{Version:"v3.4.2", GitCommit:"23dd3af5e19a02d4f4baa5b2f242645a1a3af629", GitTreeState:"clean", GoVersion:"go1.14.13"}

Kubernetes Version:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:58:59Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-eks-d1db3c", GitCommit:"d1db3c46e55f95d6a7d3e5578689371318f95ff9", GitTreeState:"clean", BuildDate:"2020-10-20T22:18:07Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Which chart: prometheus-kube-stack Which version of the chart: 14.9.0 What happened: Alertmanager pod is missing What you expected to happen: Expected alertmanager to be up and running as all other components

How to reproduce it (as minimally and precisely as possible):

The helm command that you execute and failing/misfunctioning: I'm using helmfile with values from file

helmfile -l group=monitoring apply

Helm values set after installation/upgrade:

USER-SUPPLIED VALUES:
additionalPrometheusRulesMap:
  kafka-lagging:
    groups:
    - name: kafka-lagging
      rules:
      - expr: sum without (partition) (max(kafka_offset) by (instance, cluster, partition,
          topic, env))
        record: kafka:producer_offset_max
      - expr: sum without (partition) (min(cg_kafka_offset) by (instance, cluster,
          partition, topic, env))
        record: kafka:consumer_offset_min
      - expr: sum(rate(cg_kafka_offset[5m])) by (instance, cluster, topic, env)
        record: kafka:consumer_rate
      - expr: kafka:producer_offset_max - kafka:consumer_offset_min
        record: kafka:consumer_lag
      - expr: kafka:consumer_lag / kafka:consumer_rate
        record: kafka:consumer_lag_seconds
      - alert: KafkaConsumerLagSeconds
        annotations:
          description: |
            Kafka consumer on cluster {$labels.cluster} topic {$labels.topic} env {$labels.env} is lagging:
            Current time lag is { with printf "kafka:consumer_lag_seconds{cluster='%s',topic='%s'}" $labels.cluster $labels.topic | query }{ . | first | value | humanizeDuration }{ end }.
            Current offset diff is { with printf "kafka:consumer_lag{cluster='%s',topic='%s'}" $labels.cluster $labels.topic | query }{ . | first | value | humanize }{ end }.
          summary: |
            Kafka consumer lag is more than 3 minutes or offset difference more than 1k for 5 minutes
        expr: |
          kafka:consumer_lag{env="dev"} > 1000
          or
          kafka:consumer_lag_seconds{env="dev"} > 180
        for: 5m
        labels:
          component: stream-processor
          severity: warning
alertmanager:
  config:
    global:
      resolve_timeout: 5m
    receivers:
    - email_configs:
      - auth_identity: REDACTED
        auth_password: REDACTED
        auth_username: REDACTED
        from: REDACTED
        smarthost: REDACTED
        to: REDACTED
      name: k8s-admin
    - name: msteams
      webhook_configs:
      - send_resolved: true
        url: http://prometheus-msteams:2000/alertmanager
    route:
    - receiver: k8s-admin
      routes: []
    - receiver: msteams
      routes: []
  enabled: true
defaultRules:
  rules:
    kubeScheduler: false
grafana:
  adminPassword: REDACTED
  enabled: true
  ingress:
    annotations:
      kubernetes.io/ingress.class: nginx
      nginx.ingress.kubernetes.io/ssl-redirect: "true"
    enabled: true
    hosts:
    - REDACTED
    paths:
    - /
kubeControllerManager:
  enabled: false
prometheus:
  prometheusSpec:
    podMonitorSelectorNilUsesHelmValues: false
    probeSelectorNilUsesHelmValues: false
    ruleSelectorNilUsesHelmValues: false
    serviceMonitorSelectorNilUsesHelmValues: false

Anything else we need to know:

mohan-nagandlla commented 3 years ago

Instead of kubectl get pod -n monitoring give follow command Kubectl get all -n monitoring

There you can able to see sts alertmanager check whether it is 1/1 or 0/1 if it is 0/1 then describe that sts u will get clarity what was happen exactly

Thank you Mohan nagandlla

On Thu, 15 Apr, 2021, 1:07 pm Ivan Ershov, @.***> wrote:

Describe the bug I'm launching the whole prometheus stack using helmfile and prometheus-kube-stack chart, and can see everything is ok, but alertmanager pod is completely missing:

$ kubectl get pods -n monitoringNAME READY STATUS RESTARTS AGEmonitoring-grafana-667f4cc99b-wkhp6 2/2 Running 0 13mmonitoring-kube-prometheus-operator-f5f75f765-rgbww 1/1 Running 0 13mmonitoring-kube-state-metrics-6c94f8f974-t96bc 1/1 Running 0 13mmonitoring-prometheus-node-exporter-25vtf 1/1 Running 0 13mmonitoring-prometheus-node-exporter-8mdsk 1/1 Running 0 13mmonitoring-prometheus-node-exporter-ccbhh 1/1 Running 0 13mmonitoring-prometheus-node-exporter-ql84j 1/1 Running 0 13mmonitoring-prometheus-node-exporter-qwbxk 1/1 Running 0 13mmonitoring-prometheus-node-exporter-zl2cb 1/1 Running 0 13mprometheus-monitoring-kube-prometheus-prometheus-0 2/2 Running 0 13mprometheus-msteams-76c6df678d-68df9 1/1 Running 0 14m

Version of Helm and Kubernetes:

Helm Version:

version.BuildInfo{Version:"v3.4.2", GitCommit:"23dd3af5e19a02d4f4baa5b2f242645a1a3af629", GitTreeState:"clean", GoVersion:"go1.14.13"}

Kubernetes Version:

$ kubectl versionClient Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:58:59Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"windows/amd64"}Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-eks-d1db3c", GitCommit:"d1db3c46e55f95d6a7d3e5578689371318f95ff9", GitTreeState:"clean", BuildDate:"2020-10-20T22:18:07Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Which chart: prometheus-kube-stack Which version of the chart: 14.9.0 What happened: Alertmanager pod is missing What you expected to happen: Expected alertmanager to be up and running as all other components

How to reproduce it (as minimally and precisely as possible):

The helm command that you execute and failing/misfunctioning: I'm using helmfile with values from file

helmfile -l group=monitoring apply

Helm values set after installation/upgrade:

USER-SUPPLIED VALUES:additionalPrometheusRulesMap: kafka-lagging: groups: - name: kafka-lagging rules: - expr: sum without (partition) (max(kafka_offset) by (instance, cluster, partition, topic, env)) record: kafka:producer_offset_max - expr: sum without (partition) (min(cg_kafka_offset) by (instance, cluster, partition, topic, env)) record: kafka:consumer_offset_min - expr: sum(rate(cg_kafka_offset[5m])) by (instance, cluster, topic, env) record: kafka:consumer_rate - expr: kafka:producer_offset_max - kafka:consumer_offset_min record: kafka:consumer_lag - expr: kafka:consumer_lag / kafka:consumer_rate record: kafka:consumer_lag_seconds - alert: KafkaConsumerLagSeconds annotations: description: | Kafka consumer on cluster {$labels.cluster} topic {$labels.topic} env {$labels.env} is lagging: Current time lag is { with printf "kafka:consumer_lag_seconds{cluster='%s',topic='%s'}" $labels.cluster $labels.topic | query }{ . | first | value | humanizeDuration }{ end }. Current offset diff is { with printf "kafka:consumer_lag{cluster='%s',topic='%s'}" $labels.cluster $labels.topic | query }{ . | first | value | humanize }{ end }. summary: | Kafka consumer lag is more than 3 minutes or offset difference more than 1k for 5 minutes expr: | kafka:consumer_lag{env="dev"} > 1000 or kafka:consumer_lag_seconds{env="dev"} > 180 for: 5m labels: component: stream-processor severity: warningalertmanager: config: global: resolve_timeout: 5m receivers: - email_configs: - auth_identity: REDACTED auth_password: REDACTED auth_username: REDACTED from: REDACTED smarthost: REDACTED to: REDACTED name: k8s-admin - name: msteams webhook_configs: - send_resolved: true url: http://prometheus-msteams:2000/alertmanager route: - receiver: k8s-admin routes: [] - receiver: msteams routes: [] enabled: truedefaultRules: rules: kubeScheduler: falsegrafana: adminPassword: REDACTED enabled: true ingress: annotations: kubernetes.io/ingress.class: nginx nginx.ingress.kubernetes.io/ssl-redirect: "true" enabled: true hosts: - REDACTED paths: - /kubeControllerManager: enabled: falseprometheus: prometheusSpec: podMonitorSelectorNilUsesHelmValues: false probeSelectorNilUsesHelmValues: false ruleSelectorNilUsesHelmValues: false serviceMonitorSelectorNilUsesHelmValues: false

Anything else we need to know:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/prometheus-community/helm-charts/issues/851, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMR7ZPTIAYC3MWYEW4GMRIDTI2JTBANCNFSM4264W2BQ .

iershovnsk commented 3 years ago

@mohanagandlla , thank you for the update, here is the output, there is no alertmanager STS:


$ kubectl get all -n monitoring
NAME                                                      READY   STATUS    RESTARTS   AGE
pod/monitoring-grafana-667f4cc99b-j48w6                   2/2     Running   0          5h43m
pod/monitoring-kube-prometheus-operator-f5f75f765-rgbww   1/1     Running   0          20h
pod/monitoring-kube-state-metrics-6c94f8f974-mzgrd        1/1     Running   0          5h43m
pod/monitoring-prometheus-node-exporter-25vtf             1/1     Running   0          20h
pod/monitoring-prometheus-node-exporter-2fsvl             1/1     Running   0          6h56m
pod/monitoring-prometheus-node-exporter-8mdsk             1/1     Running   0          20h
pod/monitoring-prometheus-node-exporter-9qq9n             1/1     Running   0          5h41m
pod/monitoring-prometheus-node-exporter-qwbxk             1/1     Running   0          20h
pod/monitoring-prometheus-node-exporter-zl2cb             1/1     Running   0          20h
pod/prometheus-monitoring-kube-prometheus-prometheus-0    2/2     Running   1          5h42m
pod/prometheus-msteams-76c6df678d-68df9                   1/1     Running   0          20h

NAME                                              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/monitoring-grafana                        ClusterIP   10.100.203.43    <none>        80/TCP     20h
service/monitoring-kube-prometheus-alertmanager   ClusterIP   10.100.170.44    <none>        9093/TCP   20h
service/monitoring-kube-prometheus-operator       ClusterIP   10.100.79.228    <none>        443/TCP    20h
service/monitoring-kube-prometheus-prometheus     ClusterIP   10.100.160.92    <none>        9090/TCP   20h
service/monitoring-kube-state-metrics             ClusterIP   10.100.218.252   <none>        8080/TCP   20h
service/monitoring-prometheus-node-exporter       ClusterIP   10.100.204.89    <none>        9100/TCP   20h
service/prometheus-msteams                        ClusterIP   10.100.131.125   <none>        2000/TCP   20h
service/prometheus-operated                       ClusterIP   None             <none>        9090/TCP   20h

NAME                                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/monitoring-prometheus-node-exporter   6         6         6       6            6           <none>          20h

NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/monitoring-grafana                    1/1     1            1           20h
deployment.apps/monitoring-kube-prometheus-operator   1/1     1            1           20h
deployment.apps/monitoring-kube-state-metrics         1/1     1            1           20h
deployment.apps/prometheus-msteams                    1/1     1            1           20h

NAME                                                            DESIRED   CURRENT   READY   AGE
replicaset.apps/monitoring-grafana-667f4cc99b                   1         1         1       20h
replicaset.apps/monitoring-kube-prometheus-operator-f5f75f765   1         1         1       20h
replicaset.apps/monitoring-kube-state-metrics-6c94f8f974        1         1         1       20h
replicaset.apps/prometheus-msteams-76c6df678d                   1         1         1       20h

NAME                                                                READY   AGE
statefulset.apps/prometheus-monitoring-kube-prometheus-prometheus   1/1     20h
mohan-nagandlla commented 3 years ago

This is might be helm package issue .

On Fri, 16 Apr, 2021, 1:07 pm Ivan Ershov, @.***> wrote:

@mohanagandlla https://github.com/mohanagandlla , thank you for the update, here is the output, there is no alertmanager STS:

$ kubectl get all -n monitoringNAME READY STATUS RESTARTS AGEpod/monitoring-grafana-667f4cc99b-j48w6 2/2 Running 0 5h43mpod/monitoring-kube-prometheus-operator-f5f75f765-rgbww 1/1 Running 0 20hpod/monitoring-kube-state-metrics-6c94f8f974-mzgrd 1/1 Running 0 5h43mpod/monitoring-prometheus-node-exporter-25vtf 1/1 Running 0 20hpod/monitoring-prometheus-node-exporter-2fsvl 1/1 Running 0 6h56mpod/monitoring-prometheus-node-exporter-8mdsk 1/1 Running 0 20hpod/monitoring-prometheus-node-exporter-9qq9n 1/1 Running 0 5h41mpod/monitoring-prometheus-node-exporter-qwbxk 1/1 Running 0 20hpod/monitoring-prometheus-node-exporter-zl2cb 1/1 Running 0 20hpod/prometheus-monitoring-kube-prometheus-prometheus-0 2/2 Running 1 5h42mpod/prometheus-msteams-76c6df678d-68df9 1/1 Running 0 20h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEservice/monitoring-grafana ClusterIP 10.100.203.43 80/TCP 20hservice/monitoring-kube-prometheus-alertmanager ClusterIP 10.100.170.44 9093/TCP 20hservice/monitoring-kube-prometheus-operator ClusterIP 10.100.79.228 443/TCP 20hservice/monitoring-kube-prometheus-prometheus ClusterIP 10.100.160.92 9090/TCP 20hservice/monitoring-kube-state-metrics ClusterIP 10.100.218.252 8080/TCP 20hservice/monitoring-prometheus-node-exporter ClusterIP 10.100.204.89 9100/TCP 20hservice/prometheus-msteams ClusterIP 10.100.131.125 2000/TCP 20hservice/prometheus-operated ClusterIP None 9090/TCP 20h NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGEdaemonset.apps/monitoring-prometheus-node-exporter 6 6 6 6 6 20h NAME READY UP-TO-DATE AVAILABLE AGEdeployment.apps/monitoring-grafana 1/1 1 1 20hdeployment.apps/monitoring-kube-prometheus-operator 1/1 1 1 20hdeployment.apps/monitoring-kube-state-metrics 1/1 1 1 20hdeployment.apps/prometheus-msteams 1/1 1 1 20h NAME DESIRED CURRENT READY AGEreplicaset.apps/monitoring-grafana-667f4cc99b 1 1 1 20hreplicaset.apps/monitoring-kube-prometheus-operator-f5f75f765 1 1 1 20hreplicaset.apps/monitoring-kube-state-metrics-6c94f8f974 1 1 1 20hreplicaset.apps/prometheus-msteams-76c6df678d 1 1 1 20h NAME READY AGEstatefulset.apps/prometheus-monitoring-kube-prometheus-prometheus 1/1 20h

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/prometheus-community/helm-charts/issues/851#issuecomment-820982224, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMR7ZPU6POGWRESFTMNE76TTI7SMDANCNFSM4264W2BQ .

iershovnsk commented 3 years ago

hmm, that's odd:

I have just updated helm/helmfile to the latest stable versions, now these are:

$ helm version
version.BuildInfo{Version:"v3.5.4", GitCommit:"1b5edb69df3d3a08df77c9902dc17af864ff05d1", GitTreeState:"clean", GoVersion:"go1.15.11"}
$ helmfile version
helmfile version v0.138.7

I have deleted everything and recreated from scratch with the same result, alertmanager STS/pod are completely missing, only the service is available

mohan-nagandlla commented 3 years ago

Nono I am saying I might be the installtion package issue not from your side it's from package side I am guessing.

On Fri, 16 Apr, 2021, 3:46 pm Ivan Ershov, @.***> wrote:

hmm, that's odd:

I have just updated helm/helmfile to the latest stable versions, now these are:

$ helm versionversion.BuildInfo{Version:"v3.5.4", GitCommit:"1b5edb69df3d3a08df77c9902dc17af864ff05d1", GitTreeState:"clean", GoVersion:"go1.15.11"} $ helmfile versionhelmfile version v0.138.7

I have deleted everything and recreated from scratch with the same result, alertmanager STS/pod are completely missing, only the service is available

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/prometheus-community/helm-charts/issues/851#issuecomment-821073944, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMR7ZPULUEZZ4YWZI4UPL33TJAE7XANCNFSM4264W2BQ .

qmiinh commented 3 years ago

Any update?

mohan-nagandlla commented 3 years ago

could you check the logs of operator might be webhook issue and please go throgh with this commnad and see the out put if you are using version > 0.46.0

kubectl get prometheus-operator -n <namespace>

else

kubectl get alertmanager -n <namespace name where you deployed the alertmanager>
iershovnsk commented 3 years ago

I believe it was the issue with my configuration for alertmanager, I have finally made it working by removing everything related to email and left only msteams part.

OlivierMary commented 3 years ago

Hi there,

same problem for me, with default value.yaml I got pods after install, if I change alertamanager.config no more updates of pods, I try helm uninstall then install and no more alertmanager pods

My alertmanager.config changes:

  config:
    global:
      resolve_timeout: 5m
    route:
      group_by: ['job']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'null'
      routes:
        - match:
            alertname: Watchdog
          receiver: 'null'
    receivers:
      - name: 'null'
        email_configs:
          - to: 'myEmail@xxx.xx'
    templates:
      - '/etc/alertmanager/config/*.tmpl'

I just added:

        email_configs:
          - to: 'myEmail@xxx.xx'

I also tried with another name of receivers.name no change. I did'nt know if that conf is Ok cause can't deploy it.

No errors in helms outputs.

Edit if needed: Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.9", GitCommit:"9dd794e454ac32d97cde41ae10be801ae98f75df", GitTreeState:"clean", BuildDate:"2021-03-18T01:00:06Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}

NAME                    NAMESPACE               REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
prometheus-operator     monitoring-operator     5               2021-05-20 12:05:44.6274425 +0200 CEST  deployed        kube-prometheus-stack-15.4.6    0.47.0
eduartua commented 3 years ago

I have the same exact issue as @OlivierMary My values:

kube-prometheus-stack:
  grafana:
    adminPassword: "PASSWORD"

  alertmanager:
    config:
      receivers:
      - name: ORG-slack
        slack_configs:
        - api_url: "https://hooks.slack.com/services/<TOKEN_THIS_URL>"
          send_resolved: true
          channel: "#infra-911"
          pretext: "Clusterops Infra"

No alertmanager STS is created

k get all -n prometheus
NAME                                                            READY   STATUS    RESTARTS   AGE
pod/kube-prometheus-stack-grafana-94cc7576b-6g24v               2/2     Running   0          10h
pod/kube-prometheus-stack-kube-state-metrics-6bd9c9779f-qwrbw   1/1     Running   0          5d11h
pod/kube-prometheus-stack-operator-66b97f784c-bzvwb             1/1     Running   0          5d11h
pod/kube-prometheus-stack-prometheus-node-exporter-dwwcd        1/1     Running   0          5d11h
pod/kube-prometheus-stack-prometheus-node-exporter-j2j6l        1/1     Running   0          5d11h
pod/kube-prometheus-stack-prometheus-node-exporter-pjb5k        1/1     Running   0          5d11h
pod/prometheus-kube-prometheus-stack-prometheus-0               2/2     Running   1          10h

NAME                                                     TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/kube-prometheus-stack-alertmanager               ClusterIP   10.102.158.19    <none>        9093/TCP   5d11h
service/kube-prometheus-stack-grafana                    ClusterIP   10.104.117.156   <none>        80/TCP     5d11h
service/kube-prometheus-stack-kube-state-metrics         ClusterIP   10.108.179.234   <none>        8080/TCP   5d11h
service/kube-prometheus-stack-operator                   ClusterIP   10.98.166.26     <none>        8080/TCP   5d11h
service/kube-prometheus-stack-prometheus                 ClusterIP   10.100.50.39     <none>        9090/TCP   3d19h
service/kube-prometheus-stack-prometheus-node-exporter   ClusterIP   10.99.36.170     <none>        9100/TCP   5d11h
service/prometheus-operated                              ClusterIP   None             <none>        9090/TCP   5d11h

NAME                                                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/kube-prometheus-stack-prometheus-node-exporter   3         3         3       3            3           <none>          5d11h

NAME                                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/kube-prometheus-stack-grafana              1/1     1            1           5d11h
deployment.apps/kube-prometheus-stack-kube-state-metrics   1/1     1            1           5d11h
deployment.apps/kube-prometheus-stack-operator             1/1     1            1           5d11h

NAME                                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/kube-prometheus-stack-grafana-94cc7576b               1         1         1       10h
replicaset.apps/kube-prometheus-stack-kube-state-metrics-6bd9c9779f   1         1         1       5d11h
replicaset.apps/kube-prometheus-stack-operator-66b97f784c             1         1         1       5d11h

NAME                                                           READY   AGE
statefulset.apps/prometheus-kube-prometheus-stack-prometheus   1/1     5d11h
eduartua commented 3 years ago

I have the same exact issue as @OlivierMary My values:

kube-prometheus-stack:
  grafana:
    adminPassword: "PASSWORD"

  alertmanager:
    config:
      receivers:
      - name: ORG-slack
        slack_configs:
        - api_url: "https://hooks.slack.com/services/<TOKEN_THIS_URL>"
          send_resolved: true
          channel: "#infra-911"
          pretext: "Clusterops Infra"

No alertmanager STS is created

k get all -n prometheus
NAME                                                            READY   STATUS    RESTARTS   AGE
pod/kube-prometheus-stack-grafana-94cc7576b-6g24v               2/2     Running   0          10h
pod/kube-prometheus-stack-kube-state-metrics-6bd9c9779f-qwrbw   1/1     Running   0          5d11h
pod/kube-prometheus-stack-operator-66b97f784c-bzvwb             1/1     Running   0          5d11h
pod/kube-prometheus-stack-prometheus-node-exporter-dwwcd        1/1     Running   0          5d11h
pod/kube-prometheus-stack-prometheus-node-exporter-j2j6l        1/1     Running   0          5d11h
pod/kube-prometheus-stack-prometheus-node-exporter-pjb5k        1/1     Running   0          5d11h
pod/prometheus-kube-prometheus-stack-prometheus-0               2/2     Running   1          10h

NAME                                                     TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/kube-prometheus-stack-alertmanager               ClusterIP   10.102.158.19    <none>        9093/TCP   5d11h
service/kube-prometheus-stack-grafana                    ClusterIP   10.104.117.156   <none>        80/TCP     5d11h
service/kube-prometheus-stack-kube-state-metrics         ClusterIP   10.108.179.234   <none>        8080/TCP   5d11h
service/kube-prometheus-stack-operator                   ClusterIP   10.98.166.26     <none>        8080/TCP   5d11h
service/kube-prometheus-stack-prometheus                 ClusterIP   10.100.50.39     <none>        9090/TCP   3d19h
service/kube-prometheus-stack-prometheus-node-exporter   ClusterIP   10.99.36.170     <none>        9100/TCP   5d11h
service/prometheus-operated                              ClusterIP   None             <none>        9090/TCP   5d11h

NAME                                                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/kube-prometheus-stack-prometheus-node-exporter   3         3         3       3            3           <none>          5d11h

NAME                                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/kube-prometheus-stack-grafana              1/1     1            1           5d11h
deployment.apps/kube-prometheus-stack-kube-state-metrics   1/1     1            1           5d11h
deployment.apps/kube-prometheus-stack-operator             1/1     1            1           5d11h

NAME                                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/kube-prometheus-stack-grafana-94cc7576b               1         1         1       10h
replicaset.apps/kube-prometheus-stack-kube-state-metrics-6bd9c9779f   1         1         1       5d11h
replicaset.apps/kube-prometheus-stack-operator-66b97f784c             1         1         1       5d11h

NAME                                                           READY   AGE
statefulset.apps/prometheus-kube-prometheus-stack-prometheus   1/1     5d11h

This is the helm chart version:

dependencies:
- name: kube-prometheus-stack
  repository: https://prometheus-community.github.io/helm-charts
  version: 16.1.0

I'll try the latest version and post updates.

mohan-nagandlla commented 3 years ago

Please check the operator logs and find the alertmanager sync logs

On Thu, 20 May 2021, 15:21 Olivier MARY, @.***> wrote:

Hi there,

same problem for me, with default value.yaml I got pods after install, if I change alertamanager.config no more updates of pods, I try helm uninstall then install and no more alertmanager pods

My alertmanager.config changes:

config: global: resolve_timeout: 5m route: group_by: ['job'] group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: 'null' routes:

  • match: alertname: Watchdog receiver: 'null' receivers:
    • name: 'null' email_configs:
      • to: @.***' templates:
    • '/etc/alertmanager/config/*.tmpl'

I also tried with another name of receivers.name no change. I did'nt know if that conf is Ok cause can't deploy it.

No errors in helms outputs.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/prometheus-community/helm-charts/issues/851#issuecomment-844930759, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMR7ZPTKP5DPS7X3PESRJN3TOTLTTANCNFSM4264W2BQ .

eduartua commented 3 years ago

@mohanagandlla This is the log from the operator

level=error ts=2021-06-15T18:35:05.493894404Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"prometheus/kube-prometheus-stack-alertmanager\" failed: provision alertmanager configuration: base config from Secret could not be parsed: undefined receiver \"null\" used in route"

It's an issue with the alertmanager config

eduartua commented 3 years ago

I fixed the error adding the new receiver into the route that matches Watchdog or you can just simply remove it This is how it looks like

      routes:
      - match:
          alertname: Watchdog
        receiver: '<your_receiver>'

The default values have:

      - match:
          alertname: Watchdog
        receiver: 'null'

Which complains because of the undefined receiver.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] commented 3 years ago

This issue is being automatically closed due to inactivity.

mertyakan commented 2 years ago

Hi dude, sts was not installed during installation for me and i check operator pod logs,

I saw; failed: provision alertmanager configuration: base config from Secret could not be parsed: undefined receiver

after edited config sts&pod successfully installed

You must check logs operator,

project-administrator commented 2 years ago

No errors from deploy/prometheus-stack-kube-prom-operator, resource kubectl describe alertmanager prometheus-stack-kube-prom-alertmanager exists, but no pod running

K890 commented 1 year ago

I had the indentation of receivers incorrect hence my alertmanager pod was also not running. I found this by checking the prometheus operator pod logs. There was a clear error about that. Please check those logs and fix your alertmanager configs. That should get the alertmanager pod running .

EminBA commented 1 year ago

encountered this issue also, just make sure after overriding the config.receivers to add the null receiver otherwise AlertManager will fail to be created ("undefined receiver \"null\" used in route"" )

UntouchedWagons commented 10 months ago

So what am I supposed to do? I'm getting the undefined receiver \"null\" used in route error still.