Open w-reichert opened 3 years ago
Hey @w-reichert!
Thanks for bringing this up!
I'm planning some changes in Sloth that may affect the dashboards... so when I tackle these, it would be a good time to revisit because I may affect the current dashboards.
Best,
I'm using Sloth SLOs and Grafana dashboard too. It is pretty easy for use and has been working great so far!
I also have a feature request for the dashboard. I usually see Month error budget burn chart
panel for monitoring, but don't understand if the current burn rate is good at a glance. I would suggest that showing the graph in different colors or drawing an additional line by a burn rate of 1. I'm trying the latter solution that looks like:
Anyway, thanks for providing this product!
Xabier, thanks for the quick response. When you have a new version of Sloth and/or the dashboard we would love to test it and provide feedback.
Regards, Wolfgang
@slok Thank you for your great contributions to SRE world. I see v0.9.0 is released did you included the above ask in this release?
Not yet, I'll need a bit more of time
Hi @w-reichert!
I've revised what you said about the colors, and I did that on purpose. Mainly the error budget you have means that it has been decided to be consumed, so, the perfect error budget left would be 0%. Below that, means that you didn't achieve the reliability you were supposed to have, and above that, means that you didn't consume enough (few experiments, to slow shipping features...).
Anyhow, I would happily change that if people prefer that kind of semaphore coloring while you are approaching 0% error budget left. Regarding the negative, part, you are right, I didn't do that so people are aware of how much they fail.
@itkq Check #216
Hi Xabier @slok, thanks for looking into my recommendations.
Actually the issue we saw started with a red NaN value. Obviously this happens if a service is not running long enough to collect 30-day metrics. Hence my suggestion to begin with "color": "grey" for "value": null. Then "red" may follow for a high negative value.
Hi Xabier, first of all, many thanks for the Sloth SLOs sample dashboard (https://grafana.com/grafana/dashboards/14348)! We are using it since a while. :-)
I noticed that the color coding and ranges for
Remaining error budget (month)
is not correct. It starts in red if there are no values, it is yellow if there are no errors, and it is green if the budget is below 40%. Furthermore, I suppose negative values should be cut off since empty is empty.My suggestions:
and alike for
A rolling window of the total period (30d) error budget remaining.
Furthermore, cutting of negative budget (also occurs twice).
Thanks and regards Wolfgang