slok / sloth

🦥 Easy and simple Prometheus SLO (service level objectives) generator
https://sloth.dev
Apache License 2.0
2.06k stars 167 forks source link

Create K8s events on SLO generation errors #251

Open xairos opened 2 years ago

xairos commented 2 years ago

We're running multi-tenant clusters, where tenants get their own namespaces (using HNC). Tenants can create SLOs using Sloth, but we don't give them access to the Sloth controller itself - and thus, they cannot see its logs.

It seems beneficial to expose SLO validation/generation errors as Events in the respective namespaces, which would allow users to view errors for their own SLOs.

ex. to expose the following:

WARN[22070] item requeued due to processing error: could not generate SLOs: invalid SLO group: Key: 'SLOGroup.SLOs[0].SLI.Events.ErrorQuery' Error:Field validation for 'ErrorQuery' failed on the 'template_vars' tag

as an event (I'm unsure if this would require a change to kooper).

I'd be happy to take a crack at contributing this enhancement if it makes sense to you!

slok commented 2 years ago

Hey @xairos!

Hmmm, this was something I didn't implemented at first because adds complexity and wasn't sure if people would have a benefit form it (having Sloth logs and status on the CRs itself), however the use case that you are exposing, pretty much validates the use case...

So I think that would be a good addition.

If you want to give it a try go ahead (lately I'm short of time :sob:), I think that shouldn't be complex, I'll give you some hints!:

As always, @xairos many thanks for your contributions and improvements to sloth in all the aspects!