Open nh2 opened 5 years ago
CC @arianvp
I forgot to say: The Job for a.service canceled
happens most of the time, but not all the time, so there seems to be some race in it.
Putting in some more details as asked on IRC:
a.service
and b.service
don't take long to stop (they react in < 1s to a SIGTERM, and exit 1 when it occurs)This is effectively preventing switching configuration / restarting reliably.
Some context: https://github.com/NixOS/nixpkgs/issues/49415#issuecomment-435611241
cdn-rsync.timer
/ cdn-rsync.service
, pulls in consul.service
, and effectively prevents consul.service
from being restarted.
The
systemd.timer
man page currently does not describe important aspects of timer semantics, which can lead to unexpected/unintended/wrong behaviour when used and be very frustrating.I'll first show an example problem, and then ask some concrete questions that I couldn't answer from the current documentation.
Example problem
When I
systemctl stop a.service
, I often get the outputJob for a.service canceled.
After some investigation and lots of great help from
boucman
on IRC, we tracked it down to a timerb.timer
+b.service
, whereb.service
isRequires=/After=
ona.service
.The timer is set to
OnUnitActiveSec=1m
; the started service itself can take longer than a minute to complete its task. We observe that:systemctl stop a.service
stopsa
a
is then immediately restarted, because theb.timer
immediately triggers, and the dependency pulls ina
Job for a.service canceled
message.When investigating whether this is as expected, the problem here is that nobody seems to really understand when timers fire, and that the documentation leaves these things open. Thus:
Things to be documented
OnUnitActiveSec=
timer, should the timer fire immediately, or shold it fire after some time (so that the difference between the previous start time and the new start time is divisible by 1m)?OnUnitActiveSec=
defines a timer relative to when the unit the timer is activating was last activated" but don't describe how this interacts with stopping a service.OnUnitActiveSec=1m
. I start the service at time t=0s. Isystemctl stop
the service at t=90s. Should the service next be started (a) immediately, or (b) at t=120s?AccuracySec=1s
or something similarly small so that this doesn't create extra confusion.systemctl stop a.service
, so that the result can only beJob for a.service canceled
? If yes, how should this be explained in the documentation?After these questions are cleared, we'd like to document these semantics in the
systemd.timers
man page.