prometheus / alertmanager

Prometheus Alertmanager
https://prometheus.io
Apache License 2.0
6.63k stars 2.15k forks source link

Offer solution to test receivers #2845

Open armandgrillet opened 2 years ago

armandgrillet commented 2 years ago

When creating a receiver, it is essentially up to trial-and-error to determine whether it is working properly. There is no way to validate the provided configuration - rather, the user must simply wait for an affected alert to fire and see whether the notification was delivered.

This makes it extremely difficult for a user to determine if their receiver is working as expected, especially when using complex templates or integrating with external systems. An API to validate a receiver and to send test notifications could be a solution to this issue.

FYI this has been added to Grafana with https://github.com/grafana/grafana/pull/37308 (related blog post), we are happy to upstream similar changes if they provide value to everyone.

simonpasquier commented 2 years ago

I think it's a great idea! Could you describe more precisely how it would work? IIUC there would be a new API endpoint that would accept a full receiver configuration and trigger a test alert against all defined integrations.

grobinson-grafana commented 2 years ago

Hi @simonpasquier, I worked on this feature for Grafana. We added a new endpoint to Grafana /api/alertmanager/grafana/config/api/v1/receivers/test that takes one or more full receiver configurations (we support testing multiple receivers at the same time) and sends a test alert to each of them.

Here is an example of the JSON that the endpoint expects:

{
  "receivers": [{
    "name": "test",
    "grafana_managed_receiver_configs": [{
      "settings": {
        "addresses": "test@example.com"
      },
      "type": "email",
      "name": "test",
      "disableResolveMessage": false
    }]
  }]
}

We also support customization of the alert via custom annotations and labels:

{
  "alert": {
    "annotations": {
      "Description": "This is a custom annotation"
    },
    "labels":{
      "Host": "This is a custom label"
    }
  },
  "receivers": [{
    "name": "test",
    "grafana_managed_receiver_configs": [{
      "settings": {
        "addresses": "test@example.com"
      },
      "type": "email",
      "name": "test",
      "disableResolveMessage": false
    }]
  }]
}

If the user wants to test a custom template then this must be updated via another API in Grafana for message templates. However, it can be used in the test notification like so:

{
  "alert": {
    "annotations": {
      "Description": "This is a custom annotation"
    },
    "labels":{
      "Host": "This is a custom label"
    }
  },
  "receivers": [{
    "name": "test",
    "grafana_managed_receiver_configs": [{
      "settings": {
        "addresses": "test@example.com",
        "message": "{{ template \"test\" . }}"
      },
      "type": "email",
      "name": "test",
      "disableResolveMessage": false
    }]
  }]
}
mindhash commented 2 years ago

@simonpasquier @grobinson-grafana
I think this is a good feature to add.

I worked on a similar solution. I am initiating a new notifier and using Notify() to send sample alert.

image

The users are able to validate the connection with this method. It works similar to how tools like TablePlus allow you to test DB connection.

juliusv commented 2 years ago

This would be great to have in some form! Just keep in mind the security implications: allowing a user to post an arbitrary receiver configuration also would mean that they could make the Alertmanager connect to arbitrary URLs out there (making it an open proxy), so there would need to be limitations to this, like either having such a UI in a separate binary or at least enabling such potentially dangerous actions using a flag (like the --web.enable-admin-api and --web.enable-remote-write-receiver flag in Prometheus that enable the endpoints that allow mutating server state).

OktarianTB commented 1 year ago

I'm going to try to tackle this one in the upcoming weeks. We discussed this issue during AMWG meeting and I want to quickly go over some thoughts:

Core functionality: Provide a way to test receivers by inputting some configuration. The configuration is validated and notifications are sent accordingly. Should output whether it was able to successfully send notifications or not. Ideally, should be able to mock values and test custom templates.

Where could it live?

  1. In alertmanager as a new API endpoint. This would be similar to what was done on the Grafana side, but there may be some security implications. Potentially with webhooks you could make the AM send notifications to an external and unknown webhook, but it's not clear whether there is a scenario where this could be exploited. As AM does not have any RBAC, we could consider putting this endpoint behind a flag so that users that upgrade versions are not opted in by default.
  2. In amtool as a CLI command. This has the advantage of allowing users to test the config locally without an AM instance.

If we decide to implement both, then hopefully a lot of the implementation can be shared. 🙂

juliusv commented 1 year ago

amtool sounds like a great place to start playing with this! It could read in the config file, some example input alerts, a grouping label set, and a receiver name to test, and then it could execute that notifier for real.

gotjosh commented 1 year ago

From the Grafana side, I'm happy to allow you to act on behalf of Grafana Labs (owner of the underlying code in receivers.go) to accomplish this. Let me know if there's anything else that you might need, and I'll be happy to grant you permission to accomplish this.

acdha commented 5 months ago

I just wanted to chime in on the value of this - during an organizational migration to Teams, it would have been really useful to be able to test that configuration easily without having to manually simulate events against another instance when tweaking things like the message formatting.

One idea which might avoid some of the security considerations would be if it buffered the last notification sent to a receiver so you could simply tell Alert Manager to redeliver the same payload with the updated configuration - either something like POST /-/reload and POST /-/replay-last-notification or simply having that debugging method implicitly trigger a reload first.