This is a prometheus + alertmanager + webhook alert notification receiver.
Gather: prometheus-webhook-dingtalk; prometheus-webhook-yunpian; prometheus-webhook-weixin ;prometheus-webhook-telegram.
What is prometheus? See here prometheus
Here, the DingTalk
group robot receiver
and the yunpian
Yunpian SMS + voice receiver
and the Weixin robot
and the Telegram Bot.
1. Receive from prometheus -> alertmanager -> HTTP POST request.
2. According to the parameters and data of the Post, it is judged to send different levels, different media, and aggregated alarm information.
1. DingTalk News: Use the DingTalk group bot to inform the people who deal with it
2. YunPian sms/voice ()
Specific rules: Call the police only from 00:00 to 08:00.
Telephone is a specific solution and other media have not awakened the on-duty personnel to solve the problem. (Because it is more expensive or intrusive)
3. Enterprise WeChat
4. Telegram Bot
Here is the initialization of infra-prometheus-webhook, and using systemd to manage infra-prometheus-webhook
version=v2.2
wget https://github.com/weiqiang333/infra-prometheus-webhook/releases/download/${version}/infra-prometheus-webhook-linux-amd64-${version}.tar.gz
mkdir -p /usr/local/infra-prometheus-webhook/log
tar -zxf infra-prometheus-webhook-linux-amd64-${version}.tar.gz -C /usr/local/infra-prometheus-webhook
chmod +x /usr/local/infra-prometheus-webhook/infra-prometheus-webhook
# systemd manager serivce
cp /usr/local/infra-prometheus-webhook/configs/systemd/infra-prometheus-webhook.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable --now infra-prometheus-webhook
systemctl status infra-prometheus-webhook
/
# health check
/-/reload
# reload config file
/alerts/dingtalk/:priority
/alerts/phonecall/:role
/alerts/yunpian/:sendtype/:priority
# sendtype: sms/voice
/alerts/weixin/:priority
/alerts/telegram/:priority
Note to initialize your configuration file (configs/production.yaml)
--check check's cron: Used to check the infrastructure
--config string config file (default "configs/production.yaml")
--listen_address string server listen address. (default "0.0.0.0:8099")
prometheus's alerting_rules.yml file
groups:
name: Instances rules:
name: Disk rules:
alertmanager's alertmanager.yml file
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: 'default-receiver_web.hook'
routes:
- receiver: 'web.hook-P0'
group_by: [priority, alertname]
group_wait: 10s
repeat_interval: 1h
matchers:
- priority="P0"
- receiver: 'web.hook-P1'
group_by: [priority, alertname]
group_wait: 20s
repeat_interval: 3h
matchers:
- priority="P1"
receivers:
- name: 'default-receiver_web.hook'
webhook_configs:
- url: 'http://127.0.0.1:8099/alerts/weixin/p0'
- name: 'web.hook-P0'
webhook_configs:
- url: 'http://127.0.0.1:8099/alerts/weixin/p0'
- name: 'web.hook-P1'
webhook_configs:
- url: 'http://127.0.0.1:8099/alerts/weixin/p1'
infra-prometheus-webhook's Weixin messages
状态: PROBLEM
等级: P1
告警: Disk Is Pressure
Item values:
故障: Instance Disk Insufficient available resources utilization up to 80%