jclulow / keeper

report execution of cron jobs through a mechanism other than mail
1 stars 1 forks source link

Exporting failures to report within deadline via Prometheus #1

Open siepkes opened 3 years ago

siepkes commented 3 years ago

This looks cool and useful!

I skimmed a bit through the code and thought it might be useful to be able to configure interval deadlines for a (cron)job on the server within a job should report to the server ( like https://healthchecks.io/ ). These failures to report could then be exported via a Prometheus endpoint. That way all the alerting complexity can be handled externally.

Is that a direction your going? Would you be open to a PR for such a thing?

jclulow commented 3 years ago

Yes that's exactly the plan. I have a few backup jobs that need to run every day, and I want to wire up a Prometheus alert manager threshold based on hours since last successful completion of each of those jobs. I'm soak testing the actual data generation part now with a couple of real jobs, and I'll be wiring up some endpoints for inspecting state very soon.

It's a little early for PRs just at the moment -- I expect to make at least a few disruptive cleanups. I'm also using this as an exercise in OpenAPI-based client code generation in concert with dropshot.