mesos / chronos

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules
http://mesos.github.io/chronos/
Apache License 2.0
4.38k stars 529 forks source link

Monitoring Solution for Chronos #832

Open macadminrohit opened 7 years ago

macadminrohit commented 7 years ago

What is the best way to monitor the Chronos jobs when they have failed or halted, or long running?

srikanth-viswanathan commented 7 years ago

It appears chronos supports a callback URL to be notified of job failures. https://github.com/mesos/chronos/pull/518/files

I am not sure about long-running jobs, however. Perhaps you can build your own tool by talking to the mesos HTTP APIs and checking for tasks from Chronos that have been running for over a certain time?