Open siepkes opened 3 years ago
Yes that's exactly the plan. I have a few backup jobs that need to run every day, and I want to wire up a Prometheus alert manager threshold based on hours since last successful completion of each of those jobs. I'm soak testing the actual data generation part now with a couple of real jobs, and I'll be wiring up some endpoints for inspecting state very soon.
It's a little early for PRs just at the moment -- I expect to make at least a few disruptive cleanups. I'm also using this as an exercise in OpenAPI-based client code generation in concert with dropshot.
This looks cool and useful!
I skimmed a bit through the code and thought it might be useful to be able to configure interval deadlines for a (cron)job on the server within a job should report to the server ( like https://healthchecks.io/ ). These failures to report could then be exported via a Prometheus endpoint. That way all the alerting complexity can be handled externally.
Is that a direction your going? Would you be open to a PR for such a thing?