Today we discovered that the play app of lobid-resources at quaoar1 and weywot2 couldn't be launched and was silently unmonitored, resulting in a downtime of lobid-resources and consecutive apps like nwbib.
One issue is that while the restart.sh script already removes the RUNNING_PID the monit_restart.sh script does not. As monit uses monit_restart.sh play refuses to restart the web app because the RUNNING_PID still exists sometimes (even when the app is crashed). So the solution is to to
[x] remove the RUNNING_PID from the monit_restart.sh
It would also be nice to
[x] inform via email when monit unmonitors a process
(We don't need to be informed when monit restarts a process because that is done once a month (via crontab) for almost all web apps and is not a problem in itself because the High Available Proxy of Apache redirects to the spare server. We don't want to get too many emails because that would be too noisy.)
Today we discovered that the play app of
lobid-resources
at quaoar1 and weywot2 couldn't be launched and was silently unmonitored, resulting in a downtime oflobid-resources
and consecutive apps likenwbib
.One issue is that while the
restart.sh
script already removes the RUNNING_PID themonit_restart.sh
script does not. As monit usesmonit_restart.sh
play refuses to restart the web app because the RUNNING_PID still exists sometimes (even when the app is crashed). So the solution is to tomonit_restart.sh
It would also be nice to
(We don't need to be informed when monit restarts a process because that is done once a month (via crontab) for almost all web apps and is not a problem in itself because the High Available Proxy of Apache redirects to the spare server. We don't want to get too many emails because that would be too noisy.)