grafana / grafana

The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.
https://grafana.com
GNU Affero General Public License v3.0
65.23k stars 12.17k forks source link

Alerting support for queries using template variables #6557

Closed calind closed 3 years ago

calind commented 8 years ago

It would be pretty useful if grafana would support alerting for queries using template variables. The way I see it work it would be as follows:

  1. Generate queries foreach template variable combination (discarding template variable for all)
  2. When generating queries, consider the frozen list if the template variable is set to never refresh, else update the template variable list
  3. Allow filtering (trough regex or by providing a static value) for each template variable

The current workaround is to use an invisible wildcard metric, but the problem I see with this approach is that it loses context.

oiooj commented 8 years ago

+1

bergquist commented 8 years ago
  1. What would be the difference compared to just using all?
antoinerrr commented 8 years ago

+1 Would be nice to be able to add alerting on server with a low life time (AWS auto scaling), auto register the server on grafana is easy with the templating but it's sad to not be able to put alerting on them

calind commented 8 years ago

@bergquist it's unpractical using all for example when you have more than a dozen hosts.

nivex6impyskjxkpmldv

If for example only few of them are failing, (let's say 5), it is very useful to receive an email for each failing alert. This way is also much easier to integrate with other tools which in general expect one alert per metric.

The current approach (using all) is pretty neat though when there are fewer instances or when you are alerting at service level (eg. # of jobs in queue).

Deshke commented 8 years ago

what @calind said, i've got multiple $host variables wich are working fine with the influxDB but not with the alerts

NotSoCleverLogin commented 8 years ago

+1 as well.

Just a thought, since you are able to query with a template variable, wouldn't you just be able to do the same query with the alerting metrics and maybe iterate through the results to see which meet the alert criteria?

bergquist commented 8 years ago

@NotSoCleverLogin It would be possible. But would you want to change the behavior of alert rule based on what template varlue are selected?

Using the all option for the template is the only way that makes sense for me.

mstaalesen commented 8 years ago

+1

I have a setup of X environments with the same components in each environment. We are currently using prometheus to alert on e.g cpu usage/disk usage etc. There we specify an alert for a query, and when the alert is triggered it will just state which environment the alert was triggered from.

If we would do this with the All variable, that would work to some extent. But, using @calind's example, the screenshot would be filled with the trend of all cpus from all of my environments, and not just the environment where I would want to be informed about said problem. The graph will (or can) be obscured with information from other environments. In some scenarios it could be interesting to compare cpu in other environments, but there are no guarantees that what is happening in a test environment is happening in our production environment, etc.

We are also looking into creating dashboards that can be used by operations, showing annotations for alerts in the "standard" overview dashboard. Given that we use 'env' template variables for these kind of dashboards it's not really possible for us to do that with how it is implemented right now. I would have to manually (at least to some extent) generate a "shadow" dashboard where the alerts are triggered (which makes me loose the annotations in the overview dashboard).

Another thing I think template variables can help you do is to route the alerts (should you choose to implement such a feature) to different sources (some to operations if in production, to qa/developers if in test environments etc).

StianOvrevage commented 8 years ago

+1 for supporting alerts on templated queries.

calind commented 8 years ago

@bergquist, some dashboards don't have an All option. For example system metrics by collectd (https://grafana.net/dashboards/24). Having an All option would certainly not be practical for let's say 10 or more servers. That's why the need to iterate trough template variables.

StianOvrevage commented 8 years ago

Allowing use of All is a good and welcomed start.

In Prometheus, queries need to be written in a different way to allow All:

some.metric{hostname=~"$Hostname"}

Notice the extra tilde there, allowing for regular expression searching (and the wildcard in All).

I have not benchmarked the possible performance impact of going from a straight query to a regex search query but at least for now it would apparently solve our problems.

max3163 commented 7 years ago

+1

jordandev commented 7 years ago

+1

steverweber commented 7 years ago

not sure how it should be implemented, just know it's needed..

Krylon360 commented 7 years ago

+1 We use Prometheus as the Datasource to monitor our Kubernetes Infrastructure for bout our On-Prem K8S Clusters and our AWS K8S Clusters. All of our dashboards use Templated Variables for the Datasource ($Environment), $Instance/Node, $Namespace, and $Pod. Due to the way the Prometheus Query Structure is; all of the queries have Templated Variables; which prevents the Alert Rules from allowing to save. I would love to see Templated Variable Queries added to the alerting.

andrewawagner commented 7 years ago

+1

shervinkh commented 7 years ago

+1 We use templating dashboards for multi-server environment which is the logical way (and many people use), So we can't use alerting with grafana right now. The only way is to have a separate non-templating dashboard or setup alerting with prometheus itself which is not easy.

steverweber commented 7 years ago

perhaps if there was an option or simple way to save/export a dashboard with the template variables backed/pre-rendered into all the fields... this would perhaps be a good half way point until another solution is found.

daraeburn commented 7 years ago

+1 for supporting alerts on templated queries. We currently use templating on all our dashboards so can't take advantage of this really cool feature.

tsn77130 commented 7 years ago

+1, we have a lot of templated dashboards, and we can't use alerting for now, we have to deduplicate dashboards for having alerts, and we so lose templating power

drewboswell commented 7 years ago

+1, Almost all of our dashboards use template variables (and nested template variables).

We would like to be able to set alerts on repeat panels to get individual alerts per template-variable group if needed. Plus this means that the alerting is dynamic and not super manual as it is now.

DANGER: Variables in theory will be good to have, but we need to keep in mind that if some guy goes into your dashboard and changes the value and saves, the resulting alerting will be affected. Don't know if that's ok behaviour or not, will be complicated.

ebirukov commented 7 years ago

+1

erSitzt commented 7 years ago

When working with grafana it feels like templating is encouraged everywhere and it feels wrong to create an extra set of graphs not using variables just to use the alerting feature...

kanwangzjm commented 7 years ago

+1 for supporting alerts on templated queries. also, we found that when we use Chinese ruleName or Chinese title, we received abnormal email with rule triggered. For example, we expected “个股分时线接口请求时间(getTimeTrend) alert” but received "个股分时线接口请求时间(getTimeTrend) alert", maybe the charset is not correct.

AlexMaksimkin commented 7 years ago

+1 to implement templated vars in alerts

fingul commented 7 years ago

+1

drew-royster commented 7 years ago

+1 would get a great addition

tj13 commented 7 years ago

+1

bog-dance commented 7 years ago

+1 to implement templated vars in alerts

bbae-dev commented 7 years ago

+1

bbae-dev commented 7 years ago

+1 looking forward for it

clhlc commented 7 years ago

+1

staslev commented 7 years ago

+1

actionjax commented 7 years ago

+1

jesseorr commented 7 years ago

+1

thetechnick commented 7 years ago

Please stop writing +1! Everybody that has subscribed to this issue will get an email...

There is a github feature only to get rid of those +1 comments: https://github.com/blog/2119-add-reactions-to-pull-requests-issues-and-comments

StianOvrevage commented 7 years ago

@thetechnick There is a link in the e-mail where you can mute the thread and not receive any e-mails. But I understand that you might want to just get notified when the feature is complete, but I also like to get the issue bumped so that it hopefully will get worked on sooner :)

tomekit commented 7 years ago

Great progress on alerting overall. For the template variables in alerting, I am missing it as well. +1 :D

= On top of that there might be a bug in a way Grafana detect whether metric used in query uses the template variables.

When you've a series which uses the template variables indirectly, Grafana does not stop you to add that series as an alert. The alert obviously does not work correctly.

See the #K (it uses #D, which uses #A and #A uses templ. var): grafana

I could still select it: grafana2

marketadvorackova commented 7 years ago

Templates everywhere, which means alerting no where. Not sure how the alerting has been implemented, but for a simple graph the query gets "translated", template variables substituted with values, before making a call to the data source, right? So why not in this case? In any way, as said before, having almost all of the queries using template variables, alerting is completely of for me. Please, could you implement it so that we don't have to move alerting outside Grafana? Thanks a lot!

erSitzt commented 7 years ago

I think we should recognise that alerting with templating is not trivial and i think the ALL options is the way to go because we dont want our alerts changing when someone is using the dashboard. But grafana still would have to create new alerts if the template query returns new results... which happens quiet often as we scale our apps. This leads to more problems if you are using InfluxDB as many of us are using tags/tag values i guess, and there is no time filter for them... so grafana would create alerts for all service that ever existed on any host...

HyperDevil commented 7 years ago

+1

yellowmegaman commented 7 years ago

Just allowing to specify datasource in alerting would be ok for me. It won't break any logic, and i can specify at least production and staging environments to watch for.

marketadvorackova commented 7 years ago

ALL is an option, sure. More flexible would be a recognition of the template variables in the query and letting the user set the values up in the alert condition configuration. The best, but complicated I guess, would be to have multiple alerts (the same way there are multiple queries) so that a different alert could be set up for a different template variable values in the query. This would enable the administrator to set up different alert conditions for different hosts for example.

pdf commented 7 years ago

Multiple alerting profiles would be great, but for an initial pass, just providing the same template selectors as are available on the dashboard in the alerting panel would solve a lot of problems.

I also think there should be an toggle for each variable to aggregate results for that variable into a single notification, this is probably only enabled for template vars that have multi-select enabled. This provides a simple but effective method to control the verbosity of notifications - you may want to notify only once for multiple related metrics, but notify for each host where any metric is failing. Or, you may want to notify only once for a failing metric no matter how many hosts are affected.

siteshbehera commented 7 years ago

do we have any targeted milestone for this bug ?

tomekit commented 7 years ago

I had some issues with the alerting on a complicated queries and template variables queries. I've found out easy workaround, which maybe not pretty, but it works for my use case. It's just extracting the query after you built it, so there are no template variables and any #ROW references. This could be obvious for you, there is no rocket science, but to me it was life changer.

What I do is I prepare a query: image

then extract it using the Chrome dev tools (copy target parameter value): image

Put it in another row (switch to toggle edit mode first): image

Set up the alerting: image

Voila !

bergquist commented 7 years ago

@siteshbehera This is not a bug. Its a feature request.

But no. We dont have a milestone for this currently.

lastsky commented 7 years ago

artificial intelligence grafana plugin should be included in commit for this feature.

djerihovs commented 7 years ago

Waiting for templates in Alerts too +1

Faradax commented 7 years ago

I'm also very much in favor of what calind provided as possible implementation in the opening post. It seems to fits neatly into how many (me included) use templated dashboards - where you have one dashboard, but switch/limit some variables to manually look at specific things. I think the example of the "server"-variable might be the most fitting one. There, the template variable (without all-value) would become something not unlike a "tab" in my dashboard - I can switch between them to see different sets of data. It's then easy to assume that, when setting up an alert, the alert would exist for each possible "tab" seperately.