We recently ran into a problem where the Klados backend went down, which we only noticed because Hilmar tried to access it (#264). We should have some kind of is-system-up scheduled check to make sure that the backend is up and capable of responding to requests. Because of the way the backend is configured, we can't just check if the web server at https://reasoner.phyloref.org/reason is up -- we actually need to submit a small job for it to process, and confirm that it reasons correctly.
I've used Pingdom previously to do this kind of check, but Hilmar suggested GitHub Actions with a schedule trigger, maybe running this check hourly and e-mailing us an alert when Klados goes down and when it comes back up.
We recently ran into a problem where the Klados backend went down, which we only noticed because Hilmar tried to access it (#264). We should have some kind of is-system-up scheduled check to make sure that the backend is up and capable of responding to requests. Because of the way the backend is configured, we can't just check if the web server at https://reasoner.phyloref.org/reason is up -- we actually need to submit a small job for it to process, and confirm that it reasons correctly.
I've used Pingdom previously to do this kind of check, but Hilmar suggested GitHub Actions with a schedule trigger, maybe running this check hourly and e-mailing us an alert when Klados goes down and when it comes back up.