tail -f -n 200 /var/vcap/monit/monit.log
#> UTC Aug 1 10:41:31] info : 'k3s-server' start: /var/vcap/jobs/k3s-server/bin/ctl
#> [UTC Aug 1 10:41:41] info : 'k3s-server' process is running with pid 366216
#> [UTC Aug 1 10:42:41] error : 'k3s-server' process is not running
#> [UTC Aug 1 10:42:41] info : 'k3s-server' trying to restart
#> [UTC Aug 1 10:42:41] info : 'k3s-server' start: /var/vcap/jobs/k3s-server/bin/ctl
#> [UTC Aug 1 10:42:52] info : 'k3s-server' process is running with pid 366278
#> [UTC Aug 1 10:43:42] error : 'k3s-server' process is not running
#> [UTC Aug 1 10:43:42] info : 'k3s-server' trying to restart
#> [UTC Aug 1 10:43:42] info : 'k3s-server' start: /var/vcap/jobs/k3s-server/bin/ctl
#> [UTC Aug 1 10:43:52] info : 'k3s-server' process is running with pid 366344
#> [UTC Aug 1 10:44:12] error : 'k3s-server' process is not running
#> [UTC Aug 1 10:44:12] info : 'k3s-server' trying to restart
monit provides a service timeout mechanism for situations where a service simply refuses to start or respond over a longer period.
The timeout mechanism is based on number if service restarts and number of poll-cycles. For example, if a service had x restarts within y poll-cycles (where x <= y) then Monit will perform an action (for example unmonitor the service). If a timeout occurs Monit will send an alert message if you have register interest for this event.
The syntax for the timeout statement is as follows (keywords are in capital):
IF RESTART CYCLE(S) THEN
Here is an example where Monit will unmonitor the service if it was restarted 2 times within 3 cycles:
if 2 restarts within 3 cycles then unmonitor
To have Monit check the service again after a monitoring was disabled, run 'monit monitor ' from the command line.
Example for setting custom exec on timeout:
if 5 restarts within 5 cycles then exec "/foo/bar"
Expected behavior
As an operator In order to avoid crash loop that go unnoticed and mask error root cause such as https://github.com/orange-cloudfoundry/paas-templates/issues/2398 I need k3s-wrapper-boshrelease to back off when entering a crash loop
Observed behavior
Possible fix
Use monit support for slow process start
https://web.archive.org/web/20110816041503/https://mmonit.com/monit/documentation/monit.html