Open fabienmagagnosc opened 2 months ago
Thanks for your detailed message.
If after your modification of the parser, it worked as you expected, I propose you to use the .DeclaredAlerts
variable in your template, which includes all the alerts, firing or not.
Hi there,
I'm looking at the declaredAlerts, as your code is more important than mine, and I'm still not having any result. is there a way to have all the information no matter if it's firing or resolving ?
right now, you code is clear :
alert_parser.go :
_alertGroups[key].DeclaredAlerts = append(alertGroups[key].DeclaredAlerts, alert)
if alert.Status == "firing" {
err = alertParser.addAlertToGroup(alertGroups[key], alert)
if err != nil {
return nil, err
}
}_
only the firing alert got parser and completed with the labels, which an be used to passed into the SNMP alerts (via new OID)
I'm gonna do some checks, as the default template seems to work well:
{{ len .Alerts }}/{{ len .DeclaredAlerts }} alerts are firing:
And it always display the "2/4 alerts are firing" for instance.
How about something like:
{{- range .DeclaredAlerts }}
{{- .Labels.severity }};{{ .Status }}{{ .Labels.instance }};{{ .Labels.job }};{{ .Labels.alertname }};{{ .Annotations.summary }};{{ .Annotations.description }}
{{ end }}
?
so sorry for the delay. I have been busy with others tasks.
basically, I can provide explanations only for most of the snmp system ,but not all.
you prefer to have 2 snmp alarms :
the mapping is mostly based on different OID and/or fields to provide the matching. in the same way as the Prometheus alert manager over the alarms (nothing new)
so, when actually alarm are send, you need to have a "constance" in the alarm format, to allow the third party SNMP system to recognize them.
and example :
OID : xxx
status : firing
severity : WARN
server: server01
alarm: CPU over 80% - server01
job: node-exporter-job
OID : xxx
status : resolved
severity : WARN
server: server01
alarm: CPU over 80% - server01
job: node-exporter-job
the SNMP system can do the mapping and cancel the alarm.
I'm working on doing more sample now and I'll send asap some samples
What did you do?
I use 2 templates to generate 2 fields to allow automatic alarm resolution :
_ a default one, to provide a status FAULT or OK
{{- if .Alerts -}} FAULT {{ else -}} OK {{- end -}}
_ another one to provide the alarm information, and as any alarming system (including for example the prometheus alarm manager and others) it require to have unique "ID" to match a fault, and when it's solved.
{{ range $severity, $alerts := (groupAlertsByLabel .Alerts "severity") -}} {{- range $index, $alert := $alerts }} {{ $alert.Labels.severity }};{{ $alert.Labels.instance }};{{ $alert.Labels.job }};{{ $alert.Labels.alertname }};{{ $alert.Annotations.summary }};{{ $alert.Annotations.description }} {{ end }} {{ end }}
In my object, i got a CVS format string with the alertname, the instance, the job, the description and the summary So, the SNMP alarm system can use the alertname+instance to identify uniquely the alarm
What did you expect to see?
the alarms firing and resolving must be fairly identical, and only the description must change : FAULT or OK and the extra field allow to get in case of firing the description and summary and instance to document the alarm and the information will allow to match to firing and the resolved automatically
What did you see instead? Under which circumstances?
in case of alarms firing, no issue, everything is filled in case of alarms resolved, the extra field is empty as
Environment
System information:
it's the docker image
SNMP notifier version:
maxwo/snmp-notifier:latest as per today, so 1.5 I suppose
Note : I tested with a modified version, build locally, with the code alert_parser.go, line 69 removed (and syntax corrected) and it was then working properly, and logically meaning every alarms are treated equals
snmp_notifier, version 1.5.0 (branch: main, revision: 934455898d4bc190e65aebc1356451196a6ec983) build user: tecnotree@centos build date: 20240913-16:08:54 go version: go1.22.5 (Red Hat 1.22.5-2.el9) platform: linux/amd64 tags: netgo
Alertmanager version:
prom/alertmanager:latest as per today, so it's
Version Information Branch: HEAD BuildDate: 20240228-11:51:20 BuildUser: root@22cd11f671e9 GoVersion: go1.21.7 Revision: 0aa3c2aad14cff039931923ab16b26b7481783b5 Version: 0.27.0
Not valid, as the alarms are coming from Grafana here
Alertmanager command line:
SNMP notifier command line:
./snmp_notifier --snmp.trap-description-template=description-template.tpl --snmp.extra-field-template=4=object-template.tpl --snmp.version=V2c --snmp.destination=ss-vip:162 --snmp.community=tecnomen --snmp.timeout=5s --web.listen-address=:9465
Prometheus alert file:
Logs: