Open sammit20 opened 2 years ago
This is a wider effort that we are tracking internally, and has existed for some time. This isn't just purely a terraform thing - same goes for the .yaml provisioning, and the API itself.
Really, it's just that grafana's representation of an alert rule is really large. This is the result of a trade-off, it increases in size because of its flexibility, as it can query any arbitrary datasource.
One single model has difficulty covering all cases, as not every datasource is built around a query string. Consider the Cloudwatch/Stackdriver datasources, there's not a single query field but rather the result of a number of drop-downs.
What we are looking into is how we can have very simple, targeted rule definitions, but specific to some common datasource types. Users who need the flexibility can then fall back to the generic struct we have now. But, this effort spans a few different systems including Terraform, so it's not quite there yet.
We face the same issue as we starting to use Terraform to manage our alerts now, we have multi datasources (gcp/aws/bigquery/prometheus) with more than 150 alerts.
Coding the alert directly with the grafana_rule_group
resource was impossible (nearly 200 lines/alert), so we have created multiple modules to simplify the alert creation. We have one module per model of datasource query + one module for the grafana expression + one module per datasource (which aggregate the others modules).
It was difficult to write and there are lot of complexity (because of the mutliple modules) but at least the usage are simple and reduce to the strict minimum.
Maybe the first solution to implement to help grafana users with Terraform is to provide the model for the query of each datasource ... is actually a pain to discover the model and understand it, because nothing is documented in Grafana ...
@Eraac Would you like please to share an example of how to use grafana_rule_group with CloudWatch datasource. Thanks
Sure @obounaim the model for the CloudWatch query look like this, this can be utilized inside the model attribute https://registry.terraform.io/providers/grafana/grafana/latest/docs/resources/rule_group#model
Thanks @Eraac It seems to be working however it seems that the conditions is missing of type "expression". I tried to find the json systax for it however I was not able to in the Grafana documentation.
@obounaim indeed, here the module we have made for handling the expression model
Thanks @Eraac it works great. One more question that is maybe out of the scoop of this issue.
Is there a way to create "rule" argument in "grafana_rule_group" resource automatically using a loop like for_each ? I am aware the meta argument for_each applies to resouce, are you aware of something similar that can be used for an argument ?
example :
resource "grafana_rule_group" "my_alert_rule" {
name = "My Rule Group"
folder_uid = grafana_folder.rule_folder.uid
interval_seconds = 240
org_id = 1
for_each = toset( ["rule1", "rule2", "rule3", "rule4"] )
rule {
name = each.key
```}
}
I understand the trade-offs mentioned by @alexweav. It is really hard to maintain the Terraform-native definitions for every single supported data source, that, in the meantime, may change according to their evolution pace.
On practice though, you do not need that tons of supported data sources, you use just few. It seems totally possible that you create parameters.tf file with a summary of what makes every alert rule unique, while templating into the proper format (including JSON) on the final stage. The structure can be as @sammit20 suggested or simpler/harder depending on your needs. You write this thing once and then reuse it for every alert rule. It is impossible though to create an ideal solution for all, and every user must do it on their own based on what make sense for them.
Hello Team,
It would be great to have easy-to-comprehend templates for creating alert rules, something similar to yaml contents https://registry.terraform.io/providers/inuits/cortex/latest/docs/resources/rules. Or maybe something like this:
That makes a little less overhead in understanding what the alert rule is from the manifest itself.