Closed 23ewrdtf closed 2 years ago
We are seeing the same scenario as we are getting false alarms every time based on repeat_interval value and no events are getting record on both alertamanger and prometheus ui.
$./alertmanager --version alertmanager, version 0.23.0 (branch: HEAD, revision: 61046b17771a57cfd4c4a51be370ab930a4d7d54) build user: root@e21a959be8d2 build date: 20210825-10:48:55 go version: go1.16.7
$ ./prometheus --version prometheus, version 2.26.0 (branch: HEAD, revision: 3cafc58827d1ebd1a67749f88be4218f0bab3d8d) build user: root@a67cafebe6d0 build date: 20210331-11:56:23 go version: go1.16.2 platform: linux/amd64
$ uname -a Linux seigwpdevmon01 5.4.17-2102.205.7.3.el7uek.x86_64 #2 SMP Fri Sep 17 16:52:13 PDT 2021 x86_64 x86_64 x86_64 GNU/Linux
If you go in prometheus and type ALERT in the "graph" section, do you see the alert?
This is exactly the kicker. I don't see any alerts when executing "ALERTS" in the graph or console (period = last 2 days for instance). I usually get the email alerts every 3 hours for each of the monitored resources "repeat_interval: 3h" followed with no resolved emails (send_resolved: true). We have not see this issue in all other environments running "alertmanager, version 0.16.0" and prometheus, version 2.6.1.
Example screenshots:
Prometheus alerts:
Alertmanager alerts:
Slack alerts:
No more false alarms after upgrading alertmanager to version 0.23.0.
No more false alarms after upgrading alertmanager to version 0.23.0.
Thanks, will try that.
That didn't help. Still getting those alerts.
Just got this alert in AlertManager logs and in Slack (slack time 14:16, Prometheus server time 13:16 as both in different time zones.).
There are no alerts in Prometheus. Seems like all the PrometheusTargetMissing[8b68fd2][resolved] alerts are being sent as normal ALERTS.
level=debug
ts=2021-10-28T13:16:52.129Z
caller=dispatch.go:516
component=dispatcher
aggrGroup={}:{}
msg=flushing
alerts="[
PrometheusTargetMissing[8b68fd2][resolved]
PrometheusTargetMissing[9c8d752][resolved]
PrometheusTargetMissing[0b67966][resolved]
PrometheusTargetMissing[484babe][resolved]
PrometheusTargetMissing[d7be54b][resolved]
PrometheusTargetMissing[713778d][resolved]
PrometheusTargetMissing[9a65983][resolved]
PrometheusTargetMissing[4d27dea][resolved]
ContainerMemoryUsage[f61fabf][active]
PrometheusAlertmanagerE2eDeadManSwitch[5d03b49][active]
]"
AlertmanagerAPP 14:16
[FIRING:1] Monitoring Event Notification
Alert: Prometheus target missing xxxxxxxxxxxxx - critical
Description: A Prometheus target has disappeared. An exporter might be crashed. ``
Details:
• alertname: PrometheusTargetMissing
• endpoint: metrics
• instance: xxxxxxxxxxxxx
• job: xxxxxxxxxxxxx
• namespace: default
• pod: xxxxxxxxxxxxx-xxx
• service: xxxxxxxxxxxxx
• severity: critical
Alert: Prometheus target missing xxxxxxxxxxxxx - critical
Description: A Prometheus target has disappeared. An exporter might be crashed. ``
Details:
• alertname: PrometheusTargetMissing
• endpoint: metrics
• instance: xxxxxxxxxxxxx
• job: xxxxxxxxxxxxx
• namespace: default
• pod: xxxxxxxxxxxxx-xxxxxxxxxxxxx
• service: xxxxxxxxxxxxx
• severity: critical
Alert: Prometheus target missing xxxxxxxxxxxxx - critical
Description: A Prometheus target has disappeared. An exporter might be crashed. xxxxxxxxxxxxx
Details:
• alertname: PrometheusTargetMissing
• beta_kubernetes_io_arch: amd64
• beta_kubernetes_io_instance_type: t2.medium
• beta_kubernetes_io_os: linux
• cluster: xxxxxxxxxxxxx
• failure_domain_beta_kubernetes_io_region: xxxxxxxxxxxxx
• failure_domain_beta_kubernetes_io_zone: xxxxxxxxxxxxxa
• instance: xxxxxxxxxxxxx
• job: kubernetes-nodes
• kubernetes_io_arch: amd64
• kubernetes_io_hostname: xxxxxxxxxxxxx
• kubernetes_io_os: linux
• node_kubernetes_io_instance_type: t2.medium
• xxxxxxxxxxxxx
• severity: critical
• topology_kubernetes_io_region: xxxxxxxxxxxxx
• topology_kubernetes_io_zone: xxxxxxxxxxxxxa
• type: xxxxxxxxxxxxxa
Alert: Prometheus target missing xxxxxxxxxxxxx.xxxxxxxxxxxxx.compute.internal - critical
Description: A Prometheus target has disappeared. An exporter might be crashed. xxxxxxxxxxxxx
Details:
• alertname: PrometheusTargetMissing
• beta_kubernetes_io_arch: amd64
• beta_kubernetes_io_instance_type: z1d.xlarge
• beta_kubernetes_io_os: linux
• cluster: xxxxxxxxxxxxx
• failure_domain_beta_kubernetes_io_region: xxxxxxxxxxxxx
• failure_domain_beta_kubernetes_io_zone: xxxxxxxxxxxxxa
• instance: xxxxxxxxxxxxx.xxxxxxxxxxxxx.compute.internal
• job: kubernetes-nodes
• kubernetes_io_arch: amd64
• kubernetes_io_hostname: xxxxxxxxxxxxx.xxxxxxxxxxxxx.compute.internal
• kubernetes_io_os: linux
• node_kubernetes_io_instance_type: z1d.xlarge
• xxxxxxxxxxxxx
• severity: critical
• topology_kubernetes_io_region: xxxxxxxxxxxxx
• topology_kubernetes_io_zone: xxxxxxxxxxxxxa
• xxxxxxxxxxxxx
Alert: Prometheus target missing xxxxxxxxxxxxx - critical
Description: A Prometheus target has disappeared. An exporter might be crashed. xxxxxxxxxxxxx
Details:
• alertname: PrometheusTargetMissing
• beta_kubernetes_io_arch: amd64
• beta_kubernetes_io_instance_type: t2.medium
• beta_kubernetes_io_os: linux
• cluster: xxxxxxxxxxxxx
• failure_domain_beta_kubernetes_io_region: xxxxxxxxxxxxx
• failure_domain_beta_kubernetes_io_zone: xxxxxxxxxxxxxa
• instance: xxxxxxxxxxxxx
• job: kubernetes-nodes-cadvisor
• kubernetes_io_arch: amd64
• kubernetes_io_hostname: xxxxxxxxxxxxx
• kubernetes_io_os: linux
• node_kubernetes_io_instance_type: t2.medium
• xxxxxxxxxxxxx
• severity: critical
• topology_kubernetes_io_region: xxxxxxxxxxxxx
• topology_kubernetes_io_zone: xxxxxxxxxxxxxa
• type: xxxxxxxxxxxxxa
Alert: Prometheus target missing xxxxxxxxxxxxx.xxxxxxxxxxxxx.compute.internal - critical
Description: A Prometheus target has disappeared. An exporter might be crashed. xxxxxxxxxxxxx
Details:
• alertname: PrometheusTargetMissing
• beta_kubernetes_io_arch: amd64
• beta_kubernetes_io_instance_type: z1d.xlarge
• beta_kubernetes_io_os: linux
• cluster: xxxxxxxxxxxxx
• failure_domain_beta_kubernetes_io_region: xxxxxxxxxxxxx
• failure_domain_beta_kubernetes_io_zone: xxxxxxxxxxxxxa
• instance: xxxxxxxxxxxxx.xxxxxxxxxxxxx.compute.internal
• job: kubernetes-nodes-cadvisor
• kubernetes_io_arch: amd64
• kubernetes_io_hostname: xxxxxxxxxxxxx.xxxxxxxxxxxxx.compute.internal
• kubernetes_io_os: linux
• node_kubernetes_io_instance_type: z1d.xlarge
• xxxxxxxxxxxxx
• severity: critical
• topology_kubernetes_io_region: xxxxxxxxxxxxx
• topology_kubernetes_io_zone: xxxxxxxxxxxxxa
• xxxxxxxxxxxxx
Alert: Prometheus target missing xxxxxxxxxxxxx - critical
Description: A Prometheus target has disappeared. An exporter might be crashed. ``
Details:
• alertname: PrometheusTargetMissing
• app: xxxxxxxxxxxxx
• chart: xxxxxxxxxxxxx-xxxxxxxxxxxxx
• component: agent
• heritage: Tiller
• instance: xxxxxxxxxxxxx
• job: kubernetes-service-endpoints
• kubernetes_name: xxxxxxxxxxxxx
• kubernetes_namespace: default
• kubernetes_node: xxxxxxxxxxxxx
• release: xxxxxxxxxxxxx
• severity: critical
Alert: Prometheus target missing xxxxxxxxxxxxx - critical
Description: A Prometheus target has disappeared. An exporter might be crashed. ``
Details:
• alertname: PrometheusTargetMissing
• app: xxxxxxxxxxxxx
• chart: xxxxxxxxxxxxx-xxxxxxxxxxxxx
• component: agent
• heritage: Tiller
• instance: xxxxxxxxxxxxx
• job: kubernetes-service-endpoints
• kubernetes_name: xxxxxxxxxxxxx
• kubernetes_namespace: default
• kubernetes_node: xxxxxxxxxxxxx.xxxxxxxxxxxxx.compute.internal
• release: xxxxxxxxxxxxx
• severity: critical
Alert: Prometheus AlertManager E2E dead man switch - critical
Description: Prometheus DeadManSwitch is an always-firing alert. It's used as an end-to-end test of Prometheus through the Alertmanager.
VALUE = 1
Details:
• alertname: PrometheusAlertmanagerE2eDeadManSwitch
• severity: critical
Show less
Then again two more alerts in slack and in alertmanager logs:
level=debug
ts=2021-10-28T13:36:52.168Z
caller=dispatch.go:516
component=dispatcher
aggrGroup={}:{}
msg=flushing
alerts="[
PrometheusTargetMissing[930a732][resolved] PrometheusTargetMissing[63b2c77][resolved]
PrometheusTargetMissing[2c9e4fa][active] PrometheusTargetMissing[320a208][resolved]
PrometheusTargetMissing[b6047c3][resolved] PrometheusTargetMissing[e287914][resolved]
PrometheusTargetMissing[4e752ca][active] PrometheusTargetMissing[169f6fa][resolved]
PrometheusTargetMissing[b3c6b47][resolved] PrometheusTargetMissing[0c2d5d9][resolved]
PrometheusTargetMissing[ee627b6][active] PrometheusTargetMissing[cb47392][resolved]
PrometheusTargetMissing[5e3164d][resolved] PrometheusTargetMissing[48f2962][resolved]
PrometheusTargetMissing[2712c95][active] PrometheusTargetMissing[6d0d40f][resolved]
ContainerMemoryUsage[f61fabf][active]
PrometheusAlertmanagerE2eDeadManSwitch[5d03b49][active]]"
level=debug
ts=2021-10-28T13:41:52.168Z
caller=dispatch.go:516
component=dispatcher
aggrGroup={}:{}
msg=flushing
alerts="[
PrometheusTargetMissing[930a732][resolved] PrometheusTargetMissing[63b2c77][resolved]
PrometheusTargetMissing[2c9e4fa][resolved] PrometheusTargetMissing[320a208][resolved]
PrometheusTargetMissing[b6047c3][resolved] PrometheusTargetMissing[e287914][resolved]
PrometheusTargetMissing[4e752ca][resolved] PrometheusTargetMissing[169f6fa][resolved]
PrometheusTargetMissing[b3c6b47][resolved] PrometheusTargetMissing[0c2d5d9][resolved]
PrometheusTargetMissing[ee627b6][resolved] PrometheusTargetMissing[cb47392][resolved]
PrometheusTargetMissing[5e3164d][resolved] PrometheusTargetMissing[48f2962][resolved]
PrometheusTargetMissing[2712c95][resolved] PrometheusTargetMissing[6d0d40f][resolved]
ContainerMemoryUsage[f61fabf][active]
PrometheusAlertmanagerE2eDeadManSwitch[5d03b49][active]]"
The beginning of alertmanager logs:
k logs prometheus-community-alertmanager-776696c9d4-llfjh -c prometheus-alertmanager
level=info ts=2021-10-28T13:10:46.567Z caller=main.go:225 msg="Starting Alertmanager" version="(version=0.23.0, branch=HEAD, revision=61046b17771a57cfd4c4a51be370ab930a4d7d54)"
level=info ts=2021-10-28T13:10:46.567Z caller=main.go:226 build_context="(go=go1.16.7, user=root@e21a959be8d2, date=20210825-10:48:55)"
level=debug ts=2021-10-28T13:10:51.069Z caller=main.go:372 externalURL=http://localhost:9093
level=info ts=2021-10-28T13:10:51.069Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/config/alertmanager.yml
level=info ts=2021-10-28T13:10:51.466Z caller=coordinator.go:126 component=configuration msg="Completed loading of configuration file" file=/etc/config/alertmanager.yml
level=debug ts=2021-10-28T13:10:51.567Z caller=main.go:498 routePrefix=/
level=info ts=2021-10-28T13:10:51.666Z caller=main.go:518 msg=Listening address=:9093
level=info ts=2021-10-28T13:10:51.666Z caller=tls_config.go:191 msg="TLS is disabled." http2=false
level=debug ts=2021-10-28T13:10:52.129Z caller=dispatch.go:165 component=dispatcher msg="Received alert" alert=PrometheusTargetMissing[484babe][resolved]
level=debug ts=2021-10-28T13:10:52.130Z caller=dispatch.go:165 component=dispatcher msg="Received alert" alert=PrometheusTargetMissing[d7be54b][resolved]
.
.
.
Do you specify any parameters to prometheus?
prometheus-server pod arguments:
prometheus-server:
Image: quay.io/prometheus/prometheus:v2.26.0
Args:
--storage.tsdb.retention.time=15d
--config.file=/etc/config/prometheus.yml
--storage.tsdb.path=/data
--web.console.libraries=/etc/prometheus/console_libraries
--web.console.templates=/etc/prometheus/consoles
--web.enable-lifecycle
--web.external-url=https://xxxxxxx
State: Running
prometheus-server Command-Line Flags
alertmanager.notification-queue-capacity 10000
alertmanager.timeout
config.file /etc/config/prometheus.yml
enable-feature
log.format logfmt
log.level info
query.lookback-delta 5m
query.max-concurrency 20
query.max-samples 50000000
query.timeout 2m
rules.alert.for-grace-period 10m
rules.alert.for-outage-tolerance 1h
rules.alert.resend-delay 1m
scrape.adjust-timestamps true
storage.exemplars.exemplars-limit 0
storage.remote.flush-deadline 1m
storage.remote.read-concurrent-limit 10
storage.remote.read-max-bytes-in-frame 1048576
storage.remote.read-sample-limit 50000000
storage.tsdb.allow-overlapping-blocks false
storage.tsdb.max-block-duration 1d12h
storage.tsdb.min-block-duration 2h
storage.tsdb.no-lockfile false
storage.tsdb.path /data
storage.tsdb.retention 0s
storage.tsdb.retention.size 0B
storage.tsdb.retention.time 15d
storage.tsdb.wal-compression true
storage.tsdb.wal-segment-size 0B
web.config.file
web.console.libraries /etc/prometheus/console_libraries
web.console.templates /etc/prometheus/consoles
web.cors.origin .*
web.enable-admin-api false
web.enable-lifecycle true
web.external-url https://xxx
web.listen-address 0.0.0.0:9090
web.max-connections 512
web.page-title Prometheus Time Series Collection and Processing Server
web.read-timeout 5m
web.route-prefix /
web.user-assets
alertmanager pod arguments:
prometheus-alertmanager:
Image: quay.io/prometheus/alertmanager:v0.23.0
Args:
--config.file=/etc/config/alertmanager.yml
--storage.path=/data
--cluster.listen-address=
--log.level=info
--web.external-url=http://xxxxxxx
State: Running
alertmanager Config:
global:
resolve_timeout: 5m
http_config:
follow_redirects: true
route:
receiver: slack
continue: false
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receivers:
- name: slack
slack_configs:
- send_resolved: true
http_config:
follow_redirects: true
api_url: <secret>
channel: '#xxx'
username: '{{ template "slack.default.username" . }}'
color: '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}'
title: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing
| len }}{{ end }}] Monitoring Event Notification'
title_link: '{{ template "slack.default.titlelink" . }}'
pretext: '{{ template "slack.default.pretext" . }}'
text: |-
{{ range .Alerts }}
*Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
*Description:* {{ .Annotations.description }}
*Details:*
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
short_fields: false
footer: '{{ template "slack.default.footer" . }}'
fallback: '{{ template "slack.default.fallback" . }}'
callback_id: '{{ template "slack.default.callbackid" . }}'
icon_emoji: '{{ template "slack.default.iconemoji" . }}'
icon_url: https://avatars3.githubusercontent.com/u/3380462
link_names: false
templates: []
I have the same issue. Alertmanager is sending rules that has been deleted or updated.
alertmanager, version 0.23.0 (branch: HEAD, revision: 61046b17771a57cfd4c4a51be370ab930a4d7d54)
build user: root@e21a959be8d2
build date: 20210825-10:48:55
go version: go1.16.7
platform: linux/amd64
prometheus, version 2.33.1 (branch: HEAD, revision: 4e08110891fd5177f9174c4179bc38d789985a13)
build user: root@37fc1ebac798
build date: 20220202-15:23:18
go version: go1.17.6
platform: linux/amd64
same issue. alertmanager sends alerts that not existing in prometheus and deleted rule. Upgrade to 0.23 doesn't help
Alertmanager can't create alerts by itself. There must be something somewhere firing an alert at Alertmanager.
why is this closed
What did you do? I installed alert manager with prometheus using this chart: https://github.com/prometheus-community/helm-charts/blob/main/charts/prometheus/values.yaml
prometheus-community:Chart:14.10.0:App:2.26.0
What did you expect to see? I have an alert in Prometheus:
When I go to https://prometheus/alerts this alert is not there.
When I go to https://alertmanager/#/alerts this alert is not there.
When I go to
https://prometheus/graph?g0.expr=up%20%3D%3D%200&g0.tab=1&g0.stacked=0&g0.range_input=1h
I don't get any resultsI get below alert to my slack every 5 minutes (together with other current alerts):
AlertManagerAPP 12:41 [FIRING:21] Monitoring Event Notification Alert: Prometheus target missing xxx - critical Description: A Prometheus target has disappeared. An exporter might be crashed. `` Details: • alertname: PrometheusTargetMissing • endpoint: metrics Show more
[FIRING:25] Monitoring Event Notification 12:46 Alert: Prometheus target missing xxx - critical Description: A Prometheus target has disappeared. An exporter might be crashed. `` Details: • alertname: PrometheusTargetMissing • endpoint: metrics Show more
/alertmanager $ uname -srm Linux 4.14.243-185.433.amzn2.x86_64 x86_64
/alertmanager $ alertmanager --version alertmanager, version 0.21.0 (branch: HEAD, revision: 4c6c03ebfe21009c546e4d1e9b92c371d67c021d) build user: root@dee35927357f build date: 20200617-08:54:02 go version: go1.14.4
/prometheus $ prometheus --version prometheus, version 2.26.0 (branch: HEAD, revision: 3cafc58827d1ebd1a67749f88be4218f0bab3d8d) build user: root@a67cafebe6d0 build date: 20210331-11:56:23 go version: go1.16.2 platform: linux/amd64
global: resolve_timeout: 5m http_config: {} smtp_hello: localhost smtp_require_tls: true slack_api_url:
pagerduty_url: xxx
opsgenie_api_url: xxx
wechat_api_url: xxx
victorops_api_url: xxx
route:
receiver: slack
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receivers:
name: slack slack_configs:
{{ .Labels.severity }}
Description: {{ .Annotations.description }} Details: {{ range .Labels.SortedPairs }} • {{ .Name }}:{{ .Value }}
{{ end }} {{ end }} footer: '{{ template "slack.default.footer" . }}' fallback: '{{ template "slack.default.fallback" . }}' callback_id: '{{ template "slack.default.callbackid" . }}' icon_emoji: '{{ template "slack.default.iconemoji" . }}' icon_url: https://avatars3.githubusercontent.com/u/3380462 templates: []Prometheus configuration file:
Logs: