failure to stop is not propagated to systemctl

systemd / systemd

The systemd System and Service Manager

GNU General Public License v2.0

13.18k stars 3.77k forks source link

Submission type

Bug report

systemd version the issue has been seen with

v238

Used distribution

Debian Downstream bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=792045

It seems that a failure on stop is not propaged to systemctl. A minimal test case

[Unit]
Description=Test

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/bin/false

# systemctl start test.service
# systemctl stop test.service
# echo $?
0
# systemctl status test
● test.service - Test
   Loaded: loaded (/etc/systemd/system/test.service; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2018-03-28 12:42:15 CEST; 9s ago
  Process: 4204 ExecStop=/bin/false (code=exited, status=1/FAILURE)
  Process: 4194 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
 Main PID: 4194 (code=exited, status=0/SUCCESS)

Mär 28 12:41:52 pluto systemd[1]: Starting Test...
Mär 28 12:41:52 pluto systemd[1]: Started Test.
Mär 28 12:42:15 pluto systemd[1]: Stopping Test...
Mär 28 12:42:15 pluto systemd[1]: test.service: Control process exited, code=exited status=1
Mär 28 12:42:15 pluto systemd[1]: test.service:

This is by design only logged but not propagated: after all if something fails the assumption is that the state afterwards is as before (or at least half-way to the goal). However, this is not the case for systemd services: if ExecStop= fails our fallback logic will apply, i.e. we'll kill all left-over processes on our own.

Propagation would also mean we'd cause shutdown transactions to fail, if any of the ExecStop= commands fail, and that's also not desirable.

Or to say this differently: stopping something must be reliable, and so it we make it so. Under no circumstances we should permit getting rid of something to fail. This doesn't relieve us from logging bout this (and we do), but it means that ExecStop= failing must be reacted upon right away, and fallback logic must beused to achieve the goal.

Starting things and stopping things in this regard are very different: that a service started properly we should only assume when everything went smoothly. But for stopping stuff, the goal matters, not the way there.

Hence, I don't think there's anything to fix here... This really works as intended.

(A long time ago, systemd allowed stop jobs to fail due to issues like this, btw. But we changed that out of the thinking explained above)

systemd / systemd