Open svenmueller opened 1 year ago
Hi @svenmueller, thanks for reporting this. Apologies for the late reply. I was on vacation and off the grid.
Just like you, I was immediately surprised by the 510 + 469 == 979
coincidence upon seeing the screenshot for the first time. Any chance you could enable debug mode so we get more details about what's going on under the hood? For example by temporarily setting the DEBUG
environment variable to 1
?
SLO Generator Version
v2.3.4
Python Version
3.9
What happened?
When using Google Cloud Monitoring backend , we sometimes (every other hour) notice wrong SLI metrics + error burn rate metrics being calculated for a short time (not correct, e.g. as there are no "bad" events). After the short time (a few minutes), the calculcated metrics are back to expected/correct numbers. We see this happen for calculations of different sliding windows like 1h, 12h, 7d or 28d. E.g. you can see a "sudden" peek in error budget burn rate for one of the sliding windows, e.g. "28 days" but other sliding windows are not affected and showing correct values.
Example SLO configuration
What did you expect?
Correct SLI/error budget rate values when there are only "good" events.
Screenshots
Relevant log output
Quite noteworthy:
Code of Conduct