prometheus / alertmanager

Prometheus Alertmanager
https://prometheus.io
Apache License 2.0
6.66k stars 2.16k forks source link

I tried to use router to match different alert notifications #3181

Open MagicStarTrace opened 1 year ago

MagicStarTrace commented 1 year ago

I plan to use a router to triage the alarms.

Where the node pod and part of the instance (matches to S_15|S_10|Ks-OpenStack-Node01|Ks-Kubernetes-Node01|bastionhost), then the alarm will notify the receiver:oops

If the instance matches to not "S_15|S_10|OpenStack-Node01|Kubernetes-Node01|Batch", then the alarm notifies receiver:IT

I'm currently trying to configure it, but it doesn't work as expected

Whether the instance matches or not, it is sent to the notification recipient with the default value

How should I adjust the configuration? Thanks!

  global:
    "resolve_timeout": "5m"

  route:     
    "group_interval": "3m"
    "repeat_interval": "5m"
    "group_wait": "10s"
    "group_by": [ "alertname", "app_name", "pod" ]  
    routes:
      - receiver: 'ops' 
      - continue: false   
      - match:
          instance: "S_15|S_10|OpenStack-Node01|Kubernetes-Node01|Batch" 
        receiver: 'ops'
      - receiver: 'IT'
        group_by: [ "instance" ]      
      - continue: false   
      - match_re:
          severity: "warning|critical" 
        receiver: 'IT'

  receivers:
    - "name": "IT"
      "webhook_configs":
       - url: "http://prometheus-alert:8080/prometheusalert?type=dd&tpl=prometheus-dd&ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxx12&at=xxxxx"

    - "name": "ops"
      "webhook_configs":
       - url: "http://prometheus-alert:8080/prometheusalert?type=dd&tpl=prometheus-dd&ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxxa6&at=xxxxx" 

  inhibit_rules:
    - equal: [ 'alertname', 'cluster', 'service', 'app_name', 'type_name' ]
      source_match:
        severity: 'critical'
      target_match:
        severity: 'warning'

Some information has been desensitised

simonpasquier commented 1 year ago

the YAML is a bit mixed up I believe:


      - continue: false   
        match:
          instance: "S_15|S_10|OpenStack-Node01|Kubernetes-Node01|Batch" 
        receiver: 'ops'
      - group_by: [ "instance" ]      
        continue: false   
        match_re:
          severity: "warning|critical" 
        receiver: 'IT'
MagicStarTrace commented 1 year ago

the YAML is a bit mixed up I believe:


      - continue: false   
        match:
          instance: "S_15|S_10|OpenStack-Node01|Kubernetes-Node01|Batch" 
        receiver: 'ops'
      - group_by: [ "instance" ]      
        continue: false   
        match_re:
          severity: "warning|critical" 
        receiver: 'IT'
global:
    "resolve_timeout": "5m"
 route:     
    "group_interval": "3m"
    "repeat_interval": "5m"
    "group_wait": "10s"
    "group_by": [ "alertname", "app_name", "pod" ]  
    receiver: 'ops' 
    routes:
      - receiver: 'ops'
      - continue: false   
        match:
          instance: "S_15|S_10|OpenStack-Node01|Kubernetes-Node01|Batch" 
        receiver: 'ops'
      - receiver: 'IT'
      - group_by: [ "instance" ]      
        continue: false   
        match_re:
          severity: "warning|critical" 
        receiver: 'IT'        
  receivers:
    - "name": "IT"
      "webhook_configs":
       - url: "http://prometheus-alert:8080/prometheusalert?type=dd&tpl=prometheus-dd&ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxx12&at=xxxxx"

    - "name": "ops"
      "webhook_configs":
       - url: "http://prometheus-alert:8080/prometheusalert?type=dd&tpl=prometheus-dd&ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxxa6&at=xxxxx" 

level=error ts=2022-12-21T03:38:29.944Z caller=coordinator.go:118 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: line 20: did not find expected key"

There appears to be an error,What should I do to fix it? Thanks!

simonpasquier commented 1 year ago
global:
    "resolve_timeout": "5m"
route:     
    "group_interval": "3m"
    "repeat_interval": "5m"
    "group_wait": "10s"
    "group_by": [ "alertname", "app_name", "pod" ]  
    receiver: 'ops' 
    routes:
      - receiver: 'ops'
        continue: false   
        match:
          instance: "S_15|S_10|OpenStack-Node01|Kubernetes-Node01|Batch" 
      - receiver: 'IT'
        group_by: [ "instance" ]      
        continue: false   
        match_re:
          severity: "warning|critical" 
 receivers:
    - "name": "IT"
      "webhook_configs":
       - url: "http://prometheus-alert:8080/prometheusalert?type=dd&tpl=prometheus-dd&ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxx12&at=xxxxx"

    - "name": "ops"
      "webhook_configs":
       - url: "http://prometheus-alert:8080/prometheusalert?type=dd&tpl=prometheus-dd&ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxxa6&at=xxxxx"