grafana / terraform-provider-grafana

Terraform Grafana provider
https://www.terraform.io/docs/providers/grafana/
Mozilla Public License 2.0
428 stars 227 forks source link

[Bug]: Creating a rule_group with slash in the name #1837

Open bschaeffer opened 1 week ago

bschaeffer commented 1 week ago

Terraform Version

1.5.3

Terraform Grafana Provider Version

3.9

Grafana Version

Enterprise

Affected Resource(s)

Terraform Configuration Files

resource "grafana_rule_group" "capture_related_alerts" {
  for_each   = local.alerts
  name       = "Capture - ${each.value.cluster}/${each.alert}"
  folder_uid = grafana_folder.weco_alerts.uid

  interval_seconds = 60
  rule {
    name           = each.value.alert
    condition      = "B"
    no_data_state  = "NoData"
    exec_err_state = "Error"
    for            = each.value.for

    data {
      ref_id         = "A"
      datasource_uid = local.datasource.cloud

      relative_time_range {
        from = 600
        to   = 0
      }

      model = jsonencode({
        expr : each.value.expr,
        instant : true,
        range : false,
        refId : "A"
      })
    }
    data {
      ref_id         = "B"
      datasource_uid = "__expr__"
      relative_time_range {
        from = 0
        to   = 0
      }
      model = jsonencode({
        expression = "$A > 0"
        type       = "math"
        refId      = "B"
      })
    }
  }
}

Expected Behavior

Creates the rule group or returns what specifically is not found.

Actual Behavior

│ Error: [PUT /v1/provisioning/folder/{FolderUID}/rule-groups/{Group}] PutAlertRuleGroup (status 404): {} │ │ with grafana_rule_group.capture_related_alerts["myalert"], │ on weco_capture_alerts.tf line 6, in resource "grafana_rule_group" "capture_related_alerts": │ 6: resource "grafana_rule_group" "capture_related_alerts" {

Steps to Reproduce

No response

Important Factoids

No response

References

No response

bschaeffer commented 1 week ago

Support dug up this line from the grafana enterprise logs which make it seem like my alert definition is causing a deadlock on the backend, but I don't know what I am doing that would cause this:

2024-10-09 01:03:47.652 logger=context traceID=258c3a16eb940595e7c6c3d203d0427c userId=461 orgId=1 uname=[<redacted-grafana-svc-account>](mailto:<redacted-grafana-svc-account>) t=2024-10-09T01:03:47.652026058Z level=error 
msg= error="failed to store provisioning status: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction" remote_addr=66.56.52.18 traceID=258c3a16eb940595e7c6c3d203d0427c
bschaeffer commented 1 week ago

Okay, we figured it out but this is definitely something the terraform provider should guard against. Apparently you cannot have / in the name in the alert rule group because you put it directly in the url. If you do, you get this confusing error:

│ Error: [PUT /v1/provisioning/folder/{FolderUID}/rule-groups/{Group}] PutAlertRuleGroup (status 404): {}
Would be helpful to get a 400 or a 422 instead with the validation error.

It took me forever to figure out that its actually an invalid input. Probably not an API bug but probably on the terraform provider to validate.