Closed garo closed 5 years ago
We have that feature in a custom we hook reveiver but not with the original alert; we do add custom promql expressions as annotations
I will break this down into two feature requests:
Rendering images in Alertmanager: Alertmanager is notified by Prometheus that a given alert is firing. It does not send along the corresponding timeseries nor an image of the graph. Thereby Alertmanager by itself can not render the image.
One could expose an endpoint on Prometheus that would return an image given a Prometheus Query, which could then be combined with the Alert from Alertmanager. While this is technically possible I think this would include several problems:
Hence I think using a custom webhook as @roidelapluie suggested is a great option. It would receive the alert from Alertmanager, retrieve the time series from Prometheus, render the graphic and send everything to your notification-system.
HTTP endpoint accepting signed queries: The Prometheus project does not provide authn authz or encryption features as of today. This might change but is unlikely to happen any time soon.
@garo what do you think?
The main reason for not accepting this feature request is that the alerting rule is not relevant to understand the context.
Some alerts have >, on() joins... and when it is resolved you can not know if it was because of missing mectric or threshold... the alerting query is meaningless.
I understand @roidelapluie your reasoning that there are many cases where the alerting rule is not easily usable for this case.
Could the alerting rule have a field (label / annotation) where the user could describe a relevant query, which could then be rendered out and sent as an attachment? This approach would require the ability to specify multiple queries which would then be embedded in the same image.
Remember that the goal is not replicate a full featured visualisation backend such as Grafana, but just to provide a first look for the on-call engineer to get a context.
@mxinden How about Alertmanager would be the component doing the actual image rendering and not Prometheus? It is true that adding image rendering components (I'm not sure how the current prometheus UI handles image rendering? Is it done fully client side with javascript?) isn't trivial.
Another option would be to handle the image rendering to an external service (such as grafana, which does have this kind of API) and then just let Alertmanager to combine all this together. In this case Alertmanager would know to call external rendering service as an user configurable URL (use templating language to construct appropriate url to be sent to the rendering service), download the created image as an attachment and then somehow expose that to public internet, for example as a random generated static url.
Could the alerting rule have a field (label / annotation) where the user could describe a relevant query, which could then be rendered out and sent as an attachment? This approach would require the ability to specify multiple queries which would then be embedded in the same image.
This is something you could create yourself using a webhook. The notification goes to the webhook, which creates an image+url to reference the image.
Another option would be to handle the image rendering to an external service (such as grafana, which does have this kind of API) and then just let Alertmanager to combine all this together. In this case Alertmanager would know to call external rendering service as an user configurable URL (use templating language to construct appropriate url to be sent to the rendering service), download the created image as an attachment and then somehow expose that to public internet, for example as a random generated static url.
If your webhook creates a url that you can "know beforehand", you could set it as an annotation on the notification.
In general, I'm not in favor of adding this specific behavior natively to alertmanager.
Thanks for your inputs.
After looking qvl/promplot and webhooks I see that I could build the feature with adding some custom code and an external service or two, but not without adding more moving parts and making the alerting more error prone.
I understand that adding this kind of feature is a bit awkward in the current state how Prometheus and Alertmanager is built, so I'm closing this issue now.
I still believe that the user story "As a person receiving an alert I want to see an embedded graph image showing the history on the metric before it triggered the alert" is still valid. If somebody can think of a better way to implement this story please open a new ticket.
For the likes of Slack, I would suggest a chatbot that grabbed such predefined dashboards. Say you're monitoring your website and you get a lot of 502 responses, I could imagine a chatbot that could respond with images. To illustrate very loosely...
bot: dude, there's lots of 502/503 responses from the website, and its freaking me out
me: spiders?
bot: here's a graph showing breakdown of web crawlers and top user-agents
/me doesn't see anything obvious there...
me: ips?
bot: here's a graph showing the top ips over time
/me sees unusually high activity from a certain IP
me: ban ip x.x.x.x
bot: intiating Ansible playbook to ban IP x.x.x.x for the default of 6 hours and recording that @me requested this action at <timestamp>
/me goes back to watch movie in peace
That does also show that you could then roll in data from logs as well as metrics, and also intiate routine responses.
If that doesn't satisfy your needs (it is a bit of a stretch-goal), you could consider moving the alerting functionality to Grafana, which could have the advantage of one alerting system that could cover Prometheus, Elasticsearch, and others..... (I haven't investigated that, but I was pretty interested by a screenshot of Slack showing a Grafana graph.)
@garo @cameronkerrnz we built that chatbot (well, something similar anyway) and it can show:
...we essentially built a library of 50+ prebuilt webhooks that you can configure with YAML.
It's open source and I would love to hear some ideas for what we're misssing or how we can make it easier. (Leave a comment here, message natan at robusta dot dev, or message me on our Slack.)
Both Slack and PagerDuty allows including one or more images with the generated alert. At least with PagerDuty the image needs to be accessible via https.
Alertmanager should have a way to generate a publicly accessible rendered image on the alert query, so that the image can be attached to the alert. This way the person receiving the alert could easily see a visual explanation on the alert context.
The generated image could at least show the alert expression, separately for the left and right side of the expression.
After thinking this a bit there's one way how this could be implemented:
These steps would enable to construct appropriate alertmanager receiver route rules, which would be able to generate required image urls with the mentioned templating functionality.