Open kitos9112 opened 1 week ago
according to this line
Oct 24 10:54:18 grafana.internal.net grafana[90001]: logger=context userId=2 orgId=1 uname=admin t=2024-10-24T10:54:18.998141Z level=error msg="ruleUID is required to query annotations" error="ruleUID is required to query annotations" remote_addr=10.124.50.1 traceID=
you seem to use an annotation backend https://github.com/grafana/grafana/blob/acb051b3141da5ff668a370e6c2989ee056f16ce/pkg/services/ngalert/state/historian/annotation.go#L112-L116
What version of Grafana do you use?
according to this line
Oct 24 10:54:18 grafana.internal.net grafana[90001]: logger=context userId=2 orgId=1 uname=admin t=2024-10-24T10:54:18.998141Z level=error msg="ruleUID is required to query annotations" error="ruleUID is required to query annotations" remote_addr=10.124.50.1 traceID=
you seem to use an annotation backend https://github.com/grafana/grafana/blob/acb051b3141da5ff668a370e6c2989ee056f16ce/pkg/services/ngalert/state/historian/annotation.go#L112-L116
What version of Grafana do you use?
I managed to replicate it in both v11.2.2 and v11.3.0
Unfortunately, I can't reproduce the problem locally. I run a docker-compose file
Wait for some time
I am not sure that time is the factor unless something in your environment changes Grafana configuration because Grafana does not have any fallback mechanisms that, for example, would switch to the annotation backend when Loki is not available.
Anyway, I do not want to eliminate this factor. For how long do you wait until the issue starts occurring?
To troubleshoot it further, can you check your logs for two messages?
Forcing Annotation backend due to state history feature toggles
and
Coercing Loki to a secondary backend due to state history feature toggles
Try to enable debug logs and when the issue starts happening check for messages that contain logger=ngalert.state.historian
. Every message should have the context value backend
. If it runs in Loki mode, then the value will be loki
. Otherwise, it will be "annotations".
@yuri-tceretian
It's a difficult one to reproduce because I'm somehow unable to determine how long we need to keep both daemons running for.
Also, I see blackouts in history state alerts when this occurs from time to time which hinges me that Grafana also stops writing to Loki too. After restarting Grafana it works again.
I'll enable debug mode on both daemons and report back my findings.
What happened?
I have enabled the following feature flags/toggles in my Grafana instance to enhance the alert state history capabilities.
This is my Loki configuration that runs alongside Grafana on the same host. I use
systemd
as a process/service supervisorFrom time to time, after having both the local Grafana server and Loki instance running for a while (maybe a few days), I encounter I cannot retrieve the state history
On the GUI:
Screenshot
What did you expect to happen?
Grafana should be able to query all history alert state changes without issues from Loki
Did this work before?
No. I might have some misconfiguration on one end but I cannot pinpoint it.
How do we reproduce it?
Is the bug inside a dashboard panel?
No response
Environment (with versions)?
Grafana: OS: Browser:
Grafana platform?
A package manager (APT, YUM, BREW, etc.)
Datasource(s)?
No response