prometheus / alertmanager

Prometheus Alertmanager
https://prometheus.io
Apache License 2.0
6.59k stars 2.14k forks source link

non business hours alert resolved status not acknowledged by alertmanager and not sent to configured receiver #3617

Open alexandrumarian-portal opened 10 months ago

alexandrumarian-portal commented 10 months ago

Hello everyone,

Please help me understand whether I misconfigured the Prometheus' Alertmanager in any way.

The scenario is the following: If the alert is triggered during business hours, the notification is being sent . If the alert is triggered during non business hours, the notification is not being sent .

If the alert is resolved during non business hours (in Prometheus), the event is not acknowledged by alertmanager and therefore the resolved status is not being sent towards the configured receiver (PagerDuty, in this case).

Please help me understand where the issue is coming from.

What did you do? Configured altermanager to send an alert only during business hours interval configured in alertmanager.yml

time_intervals:
  - name: only_in_business_hours
    time_intervals:
      - weekdays: ['monday:friday']
        times:
        - start_time: "07:00"
          end_time: "16:00"
  - name: weekend
    time_intervals:
      - weekdays: ['saturday','sunday']

Below there is the alert rule for business hours

- name: ssl_certificate_expiry
  rules:
  - alert: cert_expiring_date
    expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 7
    for: 10m
    labels:
      severity: warning
      only_in_business_hours: true
    annotations:
      summary:  The SSL certificate will expire on {{ $labels.instance }}
      description: "SSL certificate on target will expire in less than 1 week."

What did you expect to see?

If an alert is triggered during non business hours, the alert is not sent and it waits until business hours begin. If the alert is resolved during non business hours, the notification should be sent to the configured receiver.

What did you see instead? Under which circumstances?

If the alert is resolved during non business hours (in Prometheus), the event is not acknowledged by alertmanager and therefore the resolved status is not being sent towards the configured receiver (PagerDuty, in this case).

Environment

route: group_by: ['alertname', 'cluster', 'service', 'url']

group_wait: 30s

group_interval: 2m

repeat_interval: 3h receiver: 'pagerduty_channel'

routes:

receivers:

inhibit_rules:

time_intervals:


* Prometheus configuration file:

global: scrape_interval: 2s evaluation_interval: 2s query_log_file: /prometheus/logs/query.log

rule_files:

scrape_configs:

alerting: alertmanagers:


* Logs:

40007186:ts=2023-11-15T20:24:46.720Z caller=dispatch.go:515 level=debug component=dispatcher aggrGroup="{}/{only_in_business_hours=\"true\"}:{alertname=\"cert_expiring_date\", url=\"https://address.net/\"}" msg=flushing alerts=[cert_expiring_date[0308b61][resolved]] 40007453-ts=2023-11-15T20:24:46.720Z caller=notify.go:877 level=debug component=dispatcher msg="Notifications not sent, route is not within active time"

dswarbrick commented 10 months ago

Alertmanager is working as intended. If a route's active_time_intervals do not match, that route will not be active - neither to send a "firing" notification, nor to send a "resolved" notification.

alexandrumarian-portal commented 10 months ago

And if I want to send a notification to the configured receiver (when the alert is resolved outside of active_time_intervals), how can I achieve that ? Thank you.

grobinson-grafana commented 10 months ago

Hi! đŸ‘‹ I do not believe it's possible to tell Alertmanager to send resolved notifications for alerts that are silenced, muted or outside active time intervals. Someone else might be able to correct me if this is wrong.

dswarbrick commented 10 months ago

Generally this type of thing is better configured in your notification provider, e.g. PagerDuty, OpsGenie etc, since that's where you configure your teams, on-call schedules, escalation rules etc. Just let Alertmanager blast everything through to PagerDuty (regardless of time / day), and configure your custom notification behaviour there.