ysde / grafana-backup-tool

A Python-based application to backup Grafana settings by using the Grafana API
MIT License
872 stars 275 forks source link

grafana-backup doesn't restore alerts #197

Open LibiKorol opened 1 year ago

LibiKorol commented 1 year ago

we use grafana-backup-tool ver 1.2.4 for backing up our grafana. once we tried to restore the data on other server, all settings were restored successfully besides the alerts. all alert rules, channels and notification tools weren't restored. the alerts page in Grafana UI was empty.

hermanocabra commented 1 year ago

Grafana release 9.4 where you can export alert rules through API. Hope they upgrade this backup tool to support these.

mt3593 commented 1 year ago

Could you confirm with the latest version that this works for you @LibiKorol?

LibiKorol commented 1 year ago

Hi, I used the latest ver 1.3.1 with Grafana 9.5.2, all alerts were restored properly. however, the contact points and notification policies weren't restored.

mt3593 commented 1 year ago

I use this tool as a sync across a few cluster, so for contract points (slack in my case) and notification policies I tend to set this up per environment as they can be different. If we do add this capability it would be good to have the option of omitting certain types from the backup/restore

derekmceachern commented 1 year ago

The restore of rules is not working for me. Using version 1.3.3 with Grafana 9.5.1.

I dug into this and in create_alert_rule.py it is calling a function to get_alert_rule. If that returns 404 it does a create_alert_rule otherwise it does an update_alert_rule.

In my case it is returning 500 not 404. The Grafana documentation indicates it should return a 404 so I don't know why I would be getting a 500.

Getting alert rule
[DEBUG] resp status: 500
[DEBUG] resp body: {'message': 'could not find alert rule', 'traceID': ''}
|--Got a code: 500

On the update rule code I had to add the x-disable-provenance header to get it to work.

I'm not sure why I am seeing this behavior. I'm wondering if it is because my Grafana instance is running in Kubernetes cluster and my accessing it through a LoadBalancer though I'm struggling to understand why it would matter.

ken-cogniac commented 11 months ago

@derekmceachern I got the same result as yours and I'm running it in Kubernetes.

[DEBUG] resp status: 500
[DEBUG] resp body: {'message': 'could not find alert rule', 'traceID': ''}
mt3593 commented 11 months ago

smells like a grafana bug, it should be a 404. Has anyone raise this on the grafana board yet? or is this only in kubernetes?

derekmceachern commented 11 months ago

@mt3593, Finally had a chance to look into this some more after your comment and it turns out this has been fixed.

It took me a while to find the associated pull request but here it is: https://github.com/grafana/grafana/pull/67331

According to the labels it was back ported into v9.4.x and v9.5.x and was merged into the 10.0.0 version.

I upgraded to 10.0.1 in our Kubernetes environment and I'm able to confirm that this is fixed. Here is the log snippet from my logs.

restoring alert_rule: /tmp/tmpezl_xm9x/_OUTPUT_/alert_rules/202311282012/b6e4b89e-5bdd-4ac1-ac19-0659420c4e67.alert_rule
===========================================================================
Getting alert rule
[DEBUG] resp status: 404
[DEBUG] resp body: None
|--Got a code: 404
Alert rule does not exist, creating

So, I would suggest that this item can be closed.

tarang-turboml commented 1 month ago

Hi , everyone , I am still error when restoring alerts ( tried multiple envs , this is a fresh grafana:latest container with just 1 alert rule for testing ) ` restoring alert_rule: /tmp/tmpr9ms7hv9/OUTPUT/alert_rules/202409301300/adzglchghr18gf.alert_rule [DEBUG] resp status: 404 [DEBUG] resp body: return complexjson.loads(self.text, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/json/init.py", line 346, in loads return _default_decoder.decode(s) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/bin/grafana-backup", line 8, in sys.exit(main()) ^^^^^^ File "/usr/lib/python3.11/site-packages/grafana_backup/cli.py", line 55, in main restore(args, settings) File "/usr/lib/python3.11/site-packages/grafana_backup/restore.py", line 107, in main restore_components(args, settings, restore_functions, tmpdir) File "/usr/lib/python3.11/site-packages/grafana_backup/restore.py", line 145, in restore_components restore_functions[ext](args, settings, file_path) File "/usr/lib/python3.11/site-packages/grafana_backup/create_alert_rule.py", line 32, in main get_response= get_alert_rule(uid, grafana_url, http_get_headers, verify_ssl, client_cert, debug) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/site-packages/grafana_backup/dashboardApi.py", line 207, in get_alert_rule return send_grafana_get(url, http_get_headers, verify_ssl, client_cert, debug) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/site-packages/grafana_backup/dashboardApi.py", line 515, in send_grafana_get return (r.status_code, r.json()) ^^^^^^^^ File "/usr/lib/python3.11/site-packages/requests/models.py", line 975, in json raise RequestsJSONDecodeError(e.msg, e.doc, e.pos) requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0) `

looks like grafana is returning empty response on that endpoint (tested with curl) nothing in grafana logs grafana-1 | logger=context userId=2 orgId=1 uname=sa-1-backup t=2024-09-30T13:19:19.365337832Z level=info msg="Request Completed" method=GET path=/api/v1/provisioning/alert-rules/adzglchghr18gf status=404 remote_addr=redacted time_ms=9 duration=9.322269ms size=0 referer= handler=/api/v1/provisioning/alert-rules/:UID status_source=server

but anyways this seems like a grafana issue , just wanted to ask if anyone else is facing this issue or if I am doing something worng,