grafana / grafana

The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.
https://grafana.com
GNU Affero General Public License v3.0
65.35k stars 12.19k forks source link

ValidateRuleGroupInterval fires on exported/imported rule group that seemed valid #83154

Closed zlugo closed 8 months ago

zlugo commented 9 months ago

What happened?

I wanted to test provisioning alerting resources and created a simple, seemingly valid test alert through the web UI. I exported the rule group it's in using the Export all alert rules in provisioning file format endpoint of the Alerting Provisioning HTTP API. When trying to import this rule group again using the Update the interval or alert rules of a rule group, it fails with {"message":"invalid alert rule: interval (0s) should be non-zero and divided exactly by scheduler interval: 10","traceID":""} In my exported rule group JSON, I can only see an "interval":"5m" (on group level) and some "intervalMs":1000 in the rules section, so I'm not sure where this is coming from.

What did you expect to happen?

I would expect that I can export an alert rule group/alert rule that I just created without validation errors in the web UI, and import it without getting different validation errors. Maybe the validation rule is being applied to eagerly?

Of course it would not be a problem if something actually was invalid and this was also shown on the UI. But I'm worried that I can create seemingly valid rules and export them in JSON format for provisioning, and then new validation errors turn up.

Did this work before?

I have not tested this with any previous version yet, but the relevant validation logic seems to be ~2 years old.

How do we reproduce it?

  1. Create a new Grafana alert rule in its own evaluation group and export the group through the endpoint listed above. Here is my example command + output with some redactions: curl -H "Authorization: Bearer <redacted>" <grafana_url>/api/v1/provisioning/folder/fbba05f6-b1ab-4215-acf3-299812cf5852/rule-groups/<group_name>/export/?format=json

{"apiVersion":1,"groups":[{"orgId":1,"name":"<group_name>","folder":"<dashboard folder>","interval":"5m","rules":[{"uid":"e1b88847-5bc7-4bea-a5dc-166933d803f9","title":"test_2","condition":"C","data":[{"refId":"A","relativeTimeRange":{"from":600,"to":0},"datasourceUid":"PC96415006F908B67","model":{"disableTextWrap":false,"editorMode":"builder","expr":"component_up{monitored_instance=\"<instance to be monitored>\"}","fullMetaSearch":false,"includeNullMetadata":true,"instant":true,"intervalMs":1000,"legendFormat":"__auto","maxDataPoints":43200,"range":false,"refId":"A","useBackend":false}},{"refId":"B","relativeTimeRange":{"from":600,"to":0},"datasourceUid":"__expr__","model":{"conditions":[{"evaluator":{"params":[],"type":"gt"},"operator":{"type":"and"},"query":{"params":["B"]},"reducer":{"params":[],"type":"last"},"type":"query"}],"datasource":{"type":"__expr__","uid":"__expr__"},"expression":"A","intervalMs":1000,"maxDataPoints":43200,"reducer":"last","refId":"B","type":"reduce"}},{"refId":"C","relativeTimeRange":{"from":600,"to":0},"datasourceUid":"__expr__","model":{"conditions":[{"evaluator":{"params":[1],"type":"lt"},"operator":{"type":"and"},"query":{"params":["C"]},"reducer":{"params":[],"type":"last"},"type":"query"}],"datasource":{"type":"__expr__","uid":"__expr__"},"expression":"B","intervalMs":1000,"maxDataPoints":43200,"refId":"C","type":"threshold"}}],"noDataState":"NoData","execErrState":"Error","for":"5m","annotations":{},"labels":{},"isPaused":false}]}]}

  1. Try importing this rule group through the endpoint listed above. Below is my example command (with some redactions) and the response: curl -X PUT -H "Content-Type: application/json" -H "Authorization: Bearer <redacted>" -d '{"apiVersion":1,"groups":[{"orgId":1,"name":"<group_name>","folder":"<dashboard folder>","interval":"5m","rules":[{"uid":"e1b88847-5bc7-4bea-a5dc-166933d803f9","title":"test_2","condition":"C","data":[{"refId":"A","relativeTimeRange":{"from":600,"to":0},"datasourceUid":"PC96415006F908B67","model":{"disableTextWrap":false,"editorMode":"builder","expr":"component_up{monitored_instance=\"<instance to be monitored>\"}","fullMetaSearch":false,"includeNullMetadata":true,"instant":true,"intervalMs":1000,"legendFormat":"__auto","maxDataPoints":43200,"range":false,"refId":"A","useBackend":false}},{"refId":"B","relativeTimeRange":{"from":600,"to":0},"datasourceUid":"__expr__","model":{"conditions":[{"evaluator":{"params":[],"type":"gt"},"operator":{"type":"and"},"query":{"params":["B"]},"reducer":{"params":[],"type":"last"},"type":"query"}],"datasource":{"type":"__expr__","uid":"__expr__"},"expression":"A","intervalMs":1000,"maxDataPoints":43200,"reducer":"last","refId":"B","type":"reduce"}},{"refId":"C","relativeTimeRange":{"from":600,"to":0},"datasourceUid":"__expr__","model":{"conditions":[{"evaluator":{"params":[1],"type":"lt"},"operator":{"type":"and"},"query":{"params":["C"]},"reducer":{"params":[],"type":"last"},"type":"query"}],"datasource":{"type":"__expr__","uid":"__expr__"},"expression":"B","intervalMs":1000,"maxDataPoints":43200,"refId":"C","type":"threshold"}}],"noDataState":"NoData","execErrState":"Error","for":"5m","annotations":{},"labels":{},"isPaused":false}]}]}' <grafana_url>/api/v1/provisioning/folder/fbba05f6-b1ab-4215-acf3-299812cf5852/rule-groups/<group_name> {"message":"invalid alert rule: interval (0s) should be non-zero and divided exactly by scheduler interval: 10","traceID":""}

Is the bug inside a dashboard panel?

No response

Environment (with versions)?

Grafana: 10.3.1 (docker.io/grafana/grafana:10.3.1)

Grafana platform?

Kubernetes

Datasource(s)?

No response

zlugo commented 9 months ago

The same issue seems to exist in Grafana 9.5.5 as well (same error message).

tonypowa commented 8 months ago

hi @zlugo

thank you for this issue

i reproduced the issue , which resulted in a 400 Bad request

{
    "message": "invalid alert rule: interval (0s) should be non-zero and divided exactly by scheduler interval: 10",
    "traceID": ""
}

I can see the imported alert rule interval is 1m, (non-zero), as required in the error message. image

Forwarding issue to Alerting squad

KaplanSarah commented 8 months ago

The export functionality is to export the group in file provisioning format. If you want the correct payload for the provisioning API, you need to use the GET API. We realize this is not obvious and hope to fix it in the future. https://grafana.com/docs/grafana/latest/developers/http_api/alerting_provisioning/#route-get-alert-rule-group-export

YouShallNotCrash commented 2 weeks ago

Hi, according to me the issue persist (v. 11.3) :( Exporting the "proper way", with /api/v1/provisioning/folder/:folderUid/rule-groups/:group/export and importing it's output results in the same error as mentioned in the original issue

tonypowa commented 2 weeks ago

it does work for me @YouShallNotCrash

NOTE: remember is not a PUT request but a GET request

/api/v1/provisioning/folder/:folderUid/rule-groups/:group/export

Image

YouShallNotCrash commented 1 week ago

Hi @tonypowa, Thanks for your feedback! :) I'd be grateful though if you could help me to understand why am I stumbling here... Are you saying that I should import/provision these rules via PUT?

I need to move few dozen (Grafana-managed) alert rules from one Grafana instance to another. They're all in one folder and grouped under one evaluation group. I can easily export them using GUI or API call but importing (provisioning) is an issue.

What I've tried:

1. Export via GUI and paste it in a call body

2. Export via API

exporting single alert:

Any thoughts...?