openvstorage / integrationtests

Open vStorage automated integration tests.
Other
0 stars 1 forks source link

Do we need a longer retry period for service restarts? #407

Open openvstorage-ci opened 7 years ago

openvstorage-ci commented 7 years ago

From @saelbrec on December 8, 2016 14:54

What is the time window the current upstart/systemd configuration will retry to restart a service. I am here referring to the alba proxy issue in https://github.com/openvstorage/alba/issues/384 but it probably counts for others as well

Copied from original issue: openvstorage/framework#1256

openvstorage-ci commented 7 years ago

From @khenderick on December 19, 2016 13:21

For most systemd services, a failed services has to wait 100ms before systemd will start it again. Some services are configured to wait 3 seconds, others 5.

However, we use systemd's default rate limit, which should be 5 times per 10 seconds. Systemd should try to restart the service forever, but not more than 5 times every 10 seconds. So if e.g. a service immediately fails with a backoff of 100ms, it will be restarted 5 times, and then it will have to wait 9.5 seconds before it will be started again.

openvstorage-ci commented 7 years ago

From @khenderick on January 18, 2017 13:51

@wimpers, as you closed the ticket, does it mean that all configurations are OK?

openvstorage-ci commented 7 years ago

From @wimpers on January 18, 2017 14:1

@khenderick systemd restarts forever x times per 10 sec it ws ok for @saelbrec

openvstorage-ci commented 7 years ago

From @khenderick on January 18, 2017 14:13

Ah, in that case it's OK.

I was under the impression that some services would be restarted too often, or that for some services it was desired that the restarting was stopped after a given amount of time (just as it was the case with Upstart) to prevent flooding the logs with the same error over and over again, and allowing for example the HC to correctly mark the service as not running or failed (as typically a service that keeps on crashing (for whatever reason) will be in "running" a lot of the time in between crashes and restarts).

openvstorage-ci commented 7 years ago

From @pploegaert on January 19, 2017 9:16

@wimpers: we were going to validate this as part of QA too