maxwo / snmp_notifier

A webhook to relay Prometheus alerts as SNMP traps, because sometimes, you have to deal with legacy
Apache License 2.0
53 stars 33 forks source link

Alerts not receiving in snmp receiver end #132

Closed sanupanji closed 11 months ago

sanupanji commented 1 year ago

What did you do? I am using Prometheus, Grafana, Alertmanager stack to monitor my EDB postgres installed in bare metal k8s cluster.

I am receiving alerts in alert manager with below configuration.

  resolve_timeout: 5m
  http_config:
    follow_redirects: true
  smtp_hello: localhost
  smtp_require_tls: true
  pagerduty_url: https://events.pagerduty.com/v2/enqueue
  opsgenie_api_url: https://api.opsgenie.com/
  wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
  victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/
route:
  receiver: snmp_notifier
  group_by:
  - job
  continue: false
  routes:
  - receiver: "null"
    match:
      alertname: Watchdog
    continue: false
  - receiver: snmp_notifier
    continue: false
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
receivers:
- name: "null"
- name: snmp_notifier
  webhook_configs:
  - send_resolved: true
    http_config:
      follow_redirects: true
    url: http://snmpnotifier.cnp-monitoring.svc.cluster.local:9464/alerts
    max_alerts: 0
templates:
- /etc/alertmanager/config/*.tmpl

In alertmanager URL

image

Now I want to reciver those alerts in a snmp trap server. So I have installed snmp_notifier with latest image. SNMP notifier command line:

- /bin/snmp_notifier
        - --snmp.trap-description-template=/etc/snmp_notifier/description-template.tpl
        - --log.level=debug
        - --alert.severities="emergency,high,medium,low,critical,warning,info"
        - --alert.default-severity="high"
        - --snmp.destination=10.0.113.233:162
        - --snmp.version=V3
        - --snmp.authentication-enabled
        - --snmp.authentication-protocol=SHA
        - --snmp.authentication-username=snmp_user_v3
        - --snmp.authentication-password=auth_password_v3
        - --snmp.private-enabled
        - --snmp.private-protocol=AES
        - --snmp.private-password=encrypt_password_v3
        - --snmp.context-engine-id=0x8000000001020304

I have added some custom severities as per my PrometheusRule setup

In --snmp.destination I added the IP of the snmp-server service. Which I installed in same namespace using your given example https://github.com/maxwo/snmp_notifier/blob/main/scripts/kubernetes/snmp-server.yaml without any change in given yaml.

In snmp_notifier pod log getting 200 in response, though requesting header not showing

ts=2023-02-21T11:15:14.994Z caller=http_server.go:78 level=debug msg="Handling /alerts webhook request"
ts=2023-02-21T11:15:14.994Z caller=http_server.go:78 level=debug msg="Handling /alerts webhook request"
ts=2023-02-21T11:15:14.995Z caller=http_server.go:78 level=debug msg="Handling /alerts webhook request"
ts=2023-02-21T11:15:15.006Z caller=http_server.go:78 level=debug msg="Handling /alerts webhook request"
192.168.216.59 - - [21/Feb/2023:11:15:14 +0000] "POST /alerts HTTP/1.1" 200 0
192.168.216.59 - - [21/Feb/2023:11:15:14 +0000] "POST /alerts HTTP/1.1" 200 0
192.168.216.59 - - [21/Feb/2023:11:15:14 +0000] "POST /alerts HTTP/1.1" 200 0
192.168.216.59 - - [21/Feb/2023:11:15:15 +0000] "POST /alerts HTTP/1.1" 200 0

But in snmp-server nothing is coming

kubectl logs of snmp-server

NET-SNMP version 5.9.3

What did you expect to see? I am expecting the trap details to be shown in snmp-server logs, also in snmp notifier logs.

Environment Bare metal K8s cluster, kubernetes version 1.25

maxwo commented 1 year ago

Hi, For some reasons, I couldn't use the default 162 port on the sample SNMP server you are reusing.

Can you try the following:

maxwo commented 1 year ago

Actually, I managed to reproduce using your exact command-line. Haven't noticed it, but you should not add the leading 0x from the engine IDs:

        - --snmp.context-engine-id=8000000001020304
        - --snmp.security-engine-id=8000000001020304
sanupanji commented 1 year ago

Thank @maxwo its working now, I can see the traps in snmp server. But 2 things:

  1. First custom severity still not working, I need to add dummy severity in 1st index as a workaround to make it work (in this case "dummy" )
    - --alert.severities="dummy,critical,high,medium,low,warning,info"
  2. Why snmp notifier log not showing what I am requesting to snmp server. Its only showing
    192.168.216.59 - - [21/Feb/2023:11:15:14 +0000] "POST /alerts HTTP/1.1" 200 0

    Logs should include the full header and description in debug mode.

maxwo commented 1 year ago

About your points:

  1. You must have some dummy alert severities somewhere in your prometheus alert configurations. Can you describe precisely what is not working?
  2. The default logs are intended to display access logs. I may add some extra informations when in debug mode.
sanupanji commented 1 year ago

I have only below severities

critical,high,medium,low,warning,info

But the first value I am passing in - --alert.severities="critical,high,medium,low,warning,info" is not picked up by notifier Getting below error in logs:

ts=2023-02-28T06:56:00.111Z caller=http_server.go:132 level=error status=400 statustext="Bad Request" err="incorrect severity: critical" data="unsupported value type"

so I am adding "dummy" in the first position as a workaround

  - --alert.severities="dummy,critical,high,medium,low,warning,info"
maxwo commented 1 year ago

Can you try sending these alerts:

{
  "receiver": "snmp-notifier",
  "status": "firing",
  "groupLabels": {
    "environment": "production",
    "label": "test"
  },
  "alerts": [
    {
      "status": "firing",
      "labels": {
        "severity": "warning",
        "alertname": "TestAlert",
        "oid": "1.3.6.1.4.1.666.0.10.1.1.1.2.1"
      },
      "annotations": {
        "summary": "this is the random summary",
        "description": "this is the description of alert 1"
      }
    },
    {
      "status": "resolved",
      "labels": {
        "severity": "warning",
        "alertname": "TestAlert",
        "oid": "1.3.6.1.4.1.666.0.10.1.1.1.1.1"
      },
      "annotations": {
        "summary": "this is the random summary",
        "description": "this is the description of ActiveMQ alert"
      }
    },
    {
      "status": "firing",
      "labels": {
        "severity": "critical",
        "alertname": "TestAlert",
        "oid": "1.3.6.1.4.1.666.0.10.1.1.1.2.1"
      },
      "annotations": {
        "summary": "this is the summary",
        "description": "this is the description on job1"
      }
    },
    {
      "status": "resolved",
      "labels": {
        "severity": "critical",
        "alertname": "TestAlert",
        "oid": "1.3.6.1.4.1.666.0.10.1.1.1.2.1"
      },
      "annotations": {
        "summary": "this is the summary",
        "description": "this is the description on TestAlertWithoutOID"
      }
    }
  ]
}

Thanks to such a command: curl -XPOST http://localhost:9464/alerts -H 'Content-Type: application/json' --data '@alerts.json' ?

I successfully received them with no error and this command line:

./snmp_notifier         --log.level=debug \
         --alert.severities="critical,high,medium,low,warning,info" \
         --alert.default-severity="high" \
         --snmp.version=V3 \
         --snmp.authentication-enabled \
         --snmp.authentication-protocol=SHA \
         --snmp.authentication-username=snmp_user_v3 \
         --snmp.authentication-password=auth_password_v3  \
         --snmp.private-enabled \
         --snmp.private-protocol=AES \
         --snmp.private-password=encrypt_password_v3 \
         --snmp.security-engine-id=8000000001020304 --snmp.context-name=''

Which should look a lot like yours.

pknee commented 1 year ago

Hi!

I have a similar behavior as @sanupanji regarding the severities. Since it's deployed on Openshift 4.10 and ocp does implement the watchdog/dead man's switch with severity "none" I need to extend the severites.

This configuration

command:
            - /bin/snmp_notifier
            - >-
              --snmp.trap-description-template=/etc/snmp_notifier/description-template.tpl
            - '--snmp.destination=192.168.1.1:162'
            - '--web.listen-address=:8080'
            - '--alert.severities="critical,warning,info,none"' # <
            - '--log.level=debug'

throws errors as soon as a critical alert is raised

s=2023-03-21T09:43:22.386Z caller=http_server.go:132 level=error status=400 statustext="Bad Request" err="incorrect severity: critical" data="unsupported value type"

Adding the dummy resolved the issue: - '--alert.severities="dummy,critical,warning,info,none"'

Best regars Philip

maxwo commented 1 year ago

I think the problem comes from the command-line itself:

- '--alert.severities="emergency,high,medium,low,critical,warning,info"'

Leads to the following severity list in the notifier (note the presence of "):

["emergency high medium low critical warning info"]

--alert.severities="emergency,high,medium,low,critical,warning,info", without the leading quotes, seems to correctly parse the value, as well as '--alert.severities=emergency,high,medium,low,critical,warning,info':

[emergency high medium low critical warning info]

Can you confirm that? I will update the documentation to avoid such confusions.

maxwo commented 11 months ago

Closing, as there is no more answers, and a solution has been found.