Open stratanic opened 10 months ago
Alertmanager itself cannot perform queries, however Prometheus can perform queries within templates. See https://prometheus.io/docs/prometheus/latest/configuration/template_reference/#queries
Using that, you could write an alerting rule which included the additional values that you want in the annotation.
The humanize / humanize1024 function prefixes are internationally recognized, as defined by SI and ISO. You should probably add a "B" to your annotation for your disk alerting rule, so that it produces "MiB" rather than just the prefix alone.
Thx, but some time it's gig or bytes, result as space left.. .sample: {VALUE |HUMANIZE1024}M= 5 giM....
Where can I find a sample " perform queries within templates." ? I’ve spent several days trying to understand how templates work but alway It's existing integrations : slack, email ,wechat etc.. when it comes to "generic webhooks", I’m not sure how they function. I haven’t been able to find any samples. Can anyone help?
my goal it send some external label with key/value to alerta : i think all must in "annotation" section of the alert rule. and the annotation have 2 values:
Sample : annotations: summary: "253 Mi reprensent 3% space left "
"253 Mi" and "3%" : is two separate value. not easy to display this type of annotation in alertmanager, but very easy in other monitoring tools....
Thx, but some time it's gig or bytes, result as space left.. .sample: {VALUE |HUMANIZE1024}M= 5 giM....
Try {value | humanize1024}B
.
The humanize / humanize1024 functions are only intended to provide a prefix. It's up to you to supply the unit, e.g. {value | humanize}J
for an energy measurement (Joules), or {value | humanize}Pa
for pressure (Pascals).
Where can I find a sample " perform queries within templates." ?
https://prometheus.io/docs/prometheus/latest/configuration/template_examples/#display-one-value
I’ve spent several days trying to understand how templates work but alway It's existing integrations : slack, email ,wechat etc..
Prometheus knows nothing of Slack, WeChat, or even email integration. It merely fires alerts at Alertmanager via an API. Alertmanager is where that alert is forwarded to some kind of notification provider. It sounds like you might be confusing Prometheus templating with Alertmanager notification templates.
when it comes to "generic webhooks", I’m not sure how they function. I haven’t been able to find any samples. Can anyone help?
The JSON payload format of the HTTP POST that Alertmanager sends to webhook receivers is documented in the https://prometheus.io/docs/alerting/latest/configuration/#webhook_config section.
my goal it send some external label with key/value to alerta : i think all must in "annotation" section of the alert rule. and the annotation have 2 values:
Sample : annotations: summary: "253 Mi reprensent 3% space left "
"253 Mi" and "3%" : is two separate value. not easy to display this type of annotation in alertmanager, but very easy in other monitoring tools....
The format of the alert JSON payload that Prometheus sends to Alertmanager is described at https://prometheus.io/docs/alerting/latest/clients/
(this request for help really ought to have been posted in Discussions, or in the prometheus-users Google group)
Below is an adaptation of a generic disk space rule that I've used in production for monitoring a few thousand hosts of various types / role.
- alert: NodeDiskSpaceCritical
expr: |
node_filesystem_avail_bytes / node_filesystem_size_bytes < 0.1
labels:
severity: critical
annotations:
summary: Critical disk space {{ $labels.mountpoint }} on host {{ $labels.instance }}
description: >-
Mountpoint {{ $labels.mountpoint }} on host {{ $labels.instance }} has {{ humanizePercentage $value }} disk space remaining.
filesystem_size: "{{ with printf \"node_filesystem_size_bytes{instance='%s',mountpoint='%s'}\" $labels.instance $labels.mountpoint
| query }}{{ . | first | value | humanize1024 }}B{{ end }}"
filesystem_available: "{{ with printf \"node_filesystem_avail_bytes{instance='%s',mountpoint='%s'}\" $labels.instance $labels.mountpoint
| query }}{{ . | first | value | humanize1024 }}B{{ end }}"
In brief, it will fire if the available filesystem space is less than 10% (regardless of absolute size). The filesystem_size
and filesystem_available
annotations show how you can use queries in Prometheus templates. This will result in alerts such as the following:
[
{
"annotations": {
"description": "Mountpoint /srv on host foohost:9100 has 8.98% disk space remaining.",
"filesystem_available": "14.2GiB",
"filesystem_size": "159.1GiB",
"summary": "Critical disk space /srv on host foohost:9100"
},
"endsAt": "2023-11-24T00:48:45.921Z",
"fingerprint": "3800078d23f72e50",
"startsAt": "2023-11-24T00:36:00.921Z",
"status": {
"inhibitedBy": [],
"silencedBy": [],
"state": "active"
},
"updatedAt": "2023-11-24T01:44:45.925+01:00",
"labels": {
"alertname": "NodeDiskSpaceCritical",
"device": "/dev/sda2",
"fstype": "ext4",
"instance": "foohost:9100",
"job": "node",
"mountpoint": "/srv",
"severity": "critical"
}
}
]
I can't really say I would recommend this, and it feels like an anti-pattern to me. But I have used template queries in the past to populate labels (as opposed to annotations) in alerts.
If your goal is to convey additional information such that the human receiver can decide whether the alert is urgent (e.g. if your alert threshold is 10%, but since it's a 500 GB filesystem, it still has 50 GB free), then this already violates alerting best practices, as it leads to alert fatigue. I would recommend instead something like my original rule:
- alert: NodeDiskSpaceCritical
expr: |
node_filesystem_avail_bytes / node_filesystem_size_bytes < 0.1
unless node_filesystem_avail_bytes > 15e9
for: 5m
labels:
severity: critical
annotations:
summary: Critical disk space {{ $labels.mountpoint }} on host {{ $labels.instance }}
description: >-
Mountpoint {{ $labels.mountpoint }} on host {{ $labels.instance }} has
{{ humanizePercentage $value }} disk space remaining.
This will fire if the available filesystem space is less than 10% unless there are more than 15 GB remaining.
To learn more advanced alerting techniques, I suggest you head to the prometheus-users Google group, https://groups.google.com/g/prometheus-users/.
Thanks you soo much, you are great. I'm going test this.. 😁, and share my alert for disk space... 👍
Its work awsome 👍
Here my rule , i make rule for Windows VM and Linux VM, and exclude some big disk.
- name: VirtualMachine
rules:
- alert: Win espace critique
expr: bottomk(5, (vmware_vm_guest_disk_free{partition=~'(C|D|E|F|G|H):.?'}) < 2200000000 )
for: 2m
labels:
severity: critical
environment: Production
event: "{{ $labels.vm_name | toUpper }}"
service: "Espace Disque"
instance: "{{ $labels.host_name }}"
annotations:
summary: "{{ $labels.vm_name }}"
description: "Critique sur le serveur: {{ $labels.vm_name | toUpper }} sur {{ $labels.partition }} il reste {{ $value | humanize }}B"
value: "{{ $value | humanize }}B"
- alert: Windows Espace Warning
expr: bottomk(5, (vmware_vm_guest_disk_free{partition=~'(C|D|E|F|G|H):.?'}) > 2200000000 and (vmware_vm_guest_disk_free{partition=~'(C|D|E|F|G|H):.?'}) < 3500000000 )
for: 10m
labels:
severity: warning
environment: Production
event: "{{ $labels.vm_name | toUpper }}"
service: "Espace Disque"
instance: "{{ $labels.host_name }}"
annotations:
summary: "{{ $labels.vm_name }}"
description: "en Warning sur le serveur: {{ $labels.vm_name | toUpper }} sur {{ $labels.partition }} il reste {{ $value | humanize }}B"
value: "{{ $value | humanize }}B"
- alert: Linux Espace Critique
expr: bottomk( 6, ( vmware_vm_guest_disk_free{partition=~"^/.*", vm_name!="exp-c"} < 200000000 ) and vmware_vm_guest_disk_capacity > 370000000 )
for: 10m
labels:
severity: critical
environment: Production
event: "{{ $labels.vm_name | toUpper }}"
service: "Espace Disque"
instance: "{{ $labels.host_name }}"
annotations:
summary: "{{ $labels.vm_name }}"
description: "Critique sur le serveur: {{ $labels.vm_name | toUpper }} sur {{ $labels.partition }} il reste {{ $value | humanize }}B soit : {{ with printf \"vmware_vm_guest_disk_free{vm_name='%s',partition='%s'} /\ vmware_vm_guest_disk_capacity{vm_name='%s',partition='%s'}\" $labels.vm_name $labels.partition $labels.vm_name $labels.partition | query }}{{ . | first | value | humanizePercentage }}{{ end }} Taille total : {{ with printf \"vmware_vm_guest_disk_capacity{vm_name='%s',partition='%s'}\" $labels.vm_name $labels.partition | query }}{{ . | first | value | humanize1024 }}B{{ end }}"
value: "{{ $value | humanize }}B"
sample output description :
description Critique sur le serveur: xxdxdxdxREC sur /xdxdxd/intra/rdbms il reste 76.12Mi soit : 0.38% Taille total : 19.56GiB
Another question , description: is very long now....
How I can reduce rule with : reusable-templates ? https://prometheus.io/docs/prometheus/latest/configuration/template_examples/#defining-reusable-templates
I create a template file and declare it in my conf alertmanager (.../alertmanager .yml) right ? (Prometheus templating)
templates:
alway I read this sample at official website prometheus but they dont show, how to make it exactly...
@RichiH May I quietly suggest that you convert this issue to a discussion?
Hello,
I use Prometheus and alertmanager , but Alertmanager is very limited without templating in Alertmanager (custom weebhook) or the concept of the tools. there is a way to have in (annotations: description:) : the result of 2 or 3 query (for have more information on a alert),
and i Don't like some suffixe of the "humanize" function of prometheus ( I preferd for disk data : "Mo" or "Mega" than "Mi" or "Gi" ) .
Sample : description : Warning on the server 'SRV212': Only 253 Mi of free disk space remains, representing 3% of the total 10 GB disk space.