Open RimBlock opened 3 years ago
for monitoring and health checking: why don't you run a monitoring and health checking tools? if you don't want to mess with your own, there is a chiamon for you. also, swar plot manager is exposing limited set of metrics via prometheus.
btw, i like the idea to have a "Job stall check" (I implemented this in my own plot manager which I can open source in the future =p still under development).
Yep, I am working on an ELK setup for that in my own setup. Was thinking more for others. Thanks for the chiamon link.
Firstly thanks to Swar and the other contributers for this great system.
There are a number items that people would like implemented that have not made it in yet.
i.e.
One quick way to help people with this maybe to add the ablilty to run a script on each "view" cycle. Frequency is controlled by view refresh timer.
Advantages
Examples of scripts for constand checks.