troglobit / finit

Fast init for Linux. Cookies included
https://troglobit.com/projects/finit/
MIT License
621 stars 61 forks source link

unexpected service restart on initctl reload, notify-related? #382

Closed az143 closed 8 months ago

az143 commented 8 months ago

here's the scenario that the attached test script reproduces:

there are 3 services, A which has a pid file but no notify, service B which depends on A and uses notify:systemd, and service C which depends on B using <service/B/ready> and also uses notify:systemd. for the test there is another service D which is the same as C but depends on B using <pid/B>

what is unexpected: after normal startup (and without having touched any config files whatsoever), whenever i run initctl reload then service C gets restarted. on the other hand, service D is not restarted, nor are services A or B.

this is with finit 4.5-rc5.

i've run this test a few times with debug enabled, but am not sure what exactly is going on here. here is just the service definition from the test script:

service log:stdout name:A serv -np -i A
service log:stdout notify:systemd <pid/A> name:B serv -np -i B -N 0 -- B needs A
service log:stdout notify:systemd <service/B/ready> name:C serv -np -i C -N 0 -- C needs B(service)
service log:stdout notify:systemd <pid/B> name:D serv -np -i D -N 0 -- D needs B(pid)
task <service/C/ready,service/D/ready> name:allup /sbin/initctl cond set allup -- Everything is up

debug shows that as soon as B's condition is in flux, C gets terminated.

2023-10-27T06:02:41 [DBG]: sm_step():Stopping services not allowed after reconf ...
2023-10-27T06:02:41 [DBG]: cond_reload():
2023-10-27T06:02:41 [DBG]: service_step():                mdev(  16):  running  enabled/clean   cond:on  
2023-10-27T06:02:41 [DBG]: service_step():            testserv(   0):  waiting  enabled/clean   cond:off 
2023-10-27T06:02:41 [DBG]: service_step():                   A(  33):  running  enabled/clean   cond:on  
2023-10-27T06:02:41 [DBG]: service_step():                   B(  34):  running  enabled/clean   cond:flux
2023-10-27T06:02:41 [DBG]: cond_clear():service/B/
2023-10-27T06:02:41 [DBG]: cond_clear_noupdate():service/B/
2023-10-27T06:02:41 [DBG]: cond_set_path():/run/finit/cond/service/B/ <= 0
2023-10-27T06:02:41 [DBG]: cond_update():service/B/ready
2023-10-27T06:02:41 [DBG]: cond_update():service/B/ready: match <service/B/ready> C needs B(service)(serv)
2023-10-27T06:02:41 [DBG]: service_step():                   C(  38):  running  enabled/clean   cond:off 
2023-10-27T06:02:41 [DBG]: service_stop():Sending SIGTERM to pid:38 name:serv
2023-10-27T06:02:41 [NOT]: Stopping C[38], sending SIGTERM ...

but a few lines later, D is only paused not terminated:

2023-10-27T06:02:41 [DBG]: service_step():                   C(  38): -> stopping
2023-10-27T06:02:41 [DBG]: service_step():                   C(  38): stopping  enabled/clean   cond:off 
2023-10-27T06:02:41 [DBG]: cond_update():service/B/running
2023-10-27T06:02:41 [DBG]: service_step():                   B(  34): ->   paused
2023-10-27T06:02:41 [DBG]: service_step():                   B(  34):   paused  enabled/clean   cond:flux
2023-10-27T06:02:41 [DBG]: service_step():                   C(  38): stopping  enabled/clean   cond:off 
2023-10-27T06:02:41 [DBG]: service_step():                   D(  35):  running  enabled/clean   cond:flux
2023-10-27T06:02:41 [DBG]: cond_clear():service/D/
2023-10-27T06:02:41 [DBG]: cond_clear_noupdate():service/D/
2023-10-27T06:02:41 [DBG]: cond_set_path():/run/finit/cond/service/D/ <= 0
2023-10-27T06:02:41 [DBG]: cond_update():service/D/ready
2023-10-27T06:02:41 [DBG]: cond_update():service/D/ready: match <service/C/ready,service/D/ready> Everything is up(/sbin/initctl)
2023-10-27T06:02:41 [DBG]: service_step():               allup(   0):     done  enabled/clean   cond:off 
2023-10-27T06:02:41 [DBG]: cond_update():service/D/running
2023-10-27T06:02:41 [DBG]: service_step():                   D(  35): ->   paused

even more unexpected: if service B is changed to NOT depend on any other service, ie.

service log:stdout notify:systemd name:B serv -np -i B -N 0 -- B needs nothing

then C and D are not restarted on initctl reload. (i haven't put that variation in the test script, however.)

unexpected-restart-test.txt

troglobit commented 8 months ago

Hmm, service/B/ready shouldn't be cleared on reload(flux). I'll have to get back to you on that one.

The reason for C and D not being restarted is that you have declared them as supporting SIGHUP. The special character ! at the beginning of <> is a declaration a service does not support SIGHUP and will thus be stop/started instead, which I assume is what you expected here.

az143 commented 8 months ago

On Sun, 29 Oct 2023 10:15:09 -0700, Joachim Wiberg writes:

Hmm, service/B/ready shouldn't be cleared on reload(flux). I'll have to get back to you on that one.

anything that i can do to help with this?

-- Alexander Zangerl + GPG Key 2FCCF66BB963BD5F + https://snafu.priv.at/ IBM - "Internally Blackened Machines" -- Bob Vaughan about PSU failures

troglobit commented 8 months ago

anything that i can do to help with this?

I'm using a stripped down version of your test to reproduce. Pretty sure I've located the culprit, but will take a while to close it due to research into why we have this behavior and also $DAYJOB.

az143 commented 8 months ago

On Sun, 29 Oct 2023 17:15:14 -0700, Joachim Wiberg writes:

Closed #382 as completed via 84adec4006d5a5888fabcd3a96a0160562919e54.

wow, that was super-quick - thank you!

regards, az

-- Alexander Zangerl + GPG Key 2FCCF66BB963BD5F + https://snafu.priv.at/ "Unterschätze nie die Macht dummer Leute, die einer Meinung sind." -- Kurt Tucholsky