raintank / worldping-api

Worldping Backend Service
Other
25 stars 18 forks source link

more context/information in alert notifications #15

Open woodsaj opened 8 years ago

woodsaj commented 8 years ago

Issue by Dieterbe Friday Jul 24, 2015 at 21:00 GMT Originally opened as https://github.com/raintank/grafana/issues/370


current litmus alerts are quite plain. they convey time, endpoint and state. can we add a lot of value within a small amount of work? a worthy thought experiment, especially as it also helps with standard grafana.

woodsaj commented 8 years ago

Comment by nopzor1200 Friday Jul 24, 2015 at 21:41 GMT


These are all really good ideas. I'd love to do some of this stuff. I guess I'd like to know how possible some of this stuff is, right now, with our current "batteries"? Could use your guidance in knowing what's more practical at this point.

This is related to #125 can you help making these variables available? Currently blocked. It's a more actionable item that could use your help. It would allow us to show the time (important) as well as create a link to the dashboard (as you suggest).

PNG rendered image of monitor over time included in the alert email seems killer. I like the idea of different timeframes.

For Litmus I like the idea of being more specific. One of the biggest wins would be to detail the status of individual collectors in the alert query over time. For example, great I know nopzair.com is down from "more than 2 locations". But which 2 locations is it supposedly down for? For how long?

woodsaj commented 8 years ago

Comment by Dieterbe Monday Aug 03, 2015 at 08:36 GMT


But which 2 locations is it supposedly down for?

this might be fairly easy via bosun expression.

For how long?

for how long what? is it down from each location? if you're getting a down notification than it most often means it just went down, although there could be some cases where it may have been down from one or two collectors for a while longer. retrieving this information is non-trivial though, you'd basically have to do extra queries and travel far enough back in time until the answer is in the data. probably better to just include a graph of the last X hours.

@torkelo @woodsaj do you know if it's possible to make a snapshot-like feature, but instead of rendering a snapshot that is html+js, render to plain html only (because email clients tend to block js). we could then generate the html for emails like that. or maybe this is too much work. the alternative would be to write some code that spews out the html email and does the required "render as png" calls itself.

woodsaj commented 8 years ago

Comment by woodsaj Monday Aug 03, 2015 at 13:22 GMT


No, you cant do html only. The graphing frontend is all javascript.

I find email to be a pretty limited medium for providing this type of data and would personally rather just get a link to a "Fault" dashboard that shows the details of the alert and relevant graphs that i can interact with. Just sending a link works equally well with SMS as it does with email which many people still use.

It might be an idea to generate this 'fault' dashboard and push it to snapshot.raintank for persistence. All the alert details could be rendered in a text panel, and relevant graphs included.

woodsaj commented 8 years ago

Comment by Dieterbe Thursday Aug 27, 2015 at 11:15 GMT


The graphing frontend is all javascript.

could we not replace those with png's through the "render as png" feature?

would personally rather just get a link to a "Fault" dashboard that shows the details of the alert and relevant graphs that i can interact with.

I like this idea. and it's probably better. because even if we extended the notifications with more stuff, it would only overlap more with the dedicated full-featured "here's everything you need" dashboard that we would still need.

i know that a common #monitoringsucks theme is "we need more context in our alerts so that if we get a message in the middle of the night we know what's up", but I think it's fair for all that context to be 1 click away, especially if we can provide lots of insight/context.

however, i suspect most people will use a smartphone which has limited screen space, so that makes me still think there's a lot merit to providing some context that fits on a smartphone screen, and could come inside of the email. and then refer to the fault dashboard for more insights.

woodsaj commented 8 years ago

Comment by torkelo Monday Aug 31, 2015 at 10:46 GMT


I think having a png in an alert email to be quite useful