troglobit / finit

Fast init for Linux. Cookies included
https://troglobit.com/projects/finit/
MIT License
622 stars 61 forks source link

task [06] incorrectly runs on startup #352

Closed cornerfix closed 1 year ago

cornerfix commented 1 year ago

task [06] incorrectly runs on startup

I added the following line to my sshd startup file, expecting killall to be run on reboot and halt:

task [06] killall -TERM sshd

when i ommit 0 (leaving only [6]) - the command successfully executes on reboot. however, with [06] - the command correctly executes on reboot and halt, but also executes during boot process.

during boot finit prints message "[ OK ] killall" on the boot screen

sshd processes need to be killed on reboot and halt, otherwise connected clients freeze.
sysvinit, openrc and others send "TERM" signal to all sshd pids on reboot and shutdown

troglobit commented 1 year ago

I'll have to get back to you on the issue of; incorrectly runs on startup. I'm more curious how your sshd service line looks like, because Finit also sends TERM to sshd on reboot and shutdown by default.

In https://github.com/troglobit/finit-skel/blob/main/skel/etc/finit.d/available/sshd.conf I have specified:

task [S] /usr/bin/ssh-genhostkeys --
service [2345789] <usr/ssh-hostkeys> env:-/etc/default/sshd /usr/sbin/sshd -D $SSHD_OPTS -- OpenSSH daemon

Where, ssh-genhostkeys is a small task that runs before sshd is allowed to start. I've reserved runlevel 1 for single-user mode (no networking), and runlevel 0 and 6 are also reserved for poweroff and reboot, respectively. This means that by just moving to runlevel 1 we can verify that Finit actually stops the SSH service, which it does. So you should definitely not need your task (above).

What version of Finit are you using?

cornerfix commented 1 year ago

I am using 4.3 tar.gz from "releases" section.

Stopping SSHD is a little more complicated than that.

Finit correctly stops main SSHD process.

However, SSHD spawns new sshd processes for every connected ssh session.

Spawned instances runs independently of the main sshd process. Stopping main sshd process does not stop spawned instances. This is done intentionally, e.g. to be able to change options in etc/ssh/sshd_config and restart the sshd without disconnecting the session used to make changes to configuration file.

Other init systems do the following on reboot and halt: (1) stop main sshd instance and then (2) take care to stop all spawned sshd instances (which serve connected sessions).

If step (2) is skipped - connected ssh clients will freeze (they wil not receive tcp packet indicating tcp connection is closed) .

You can see this by connecting ssh to a sshd started by finit. On poweroff - ssh client freezes indefinitely. On reboot - ssh client freezes temporatily and drops to the shell after the machine reboots and starts sshd again.

With other init systems - ssh client immediately drops to the shell on both reboot and poweroff.

You can see how openrc does this in "stop ()" section of /etc/init.d/sshd - by executing 'kill -TERM $(pgrep sshd)'

I like finit very much and would like to use it on hundreds of servers.

However, we will need to take care of correctly stopping postgresql and application servers. So - I have a question - what is the best way to execute commands on reboot and halt ? Is there a better way than 'task [06]' ?

troglobit commented 1 year ago

I see, did not know this. Thank you for elaborating on it! I'm guessing sshd then not just forks these to the background but also change there process group too, because Finit sends the TERM/KILL signal to all processes in the same process group.

There are lots of ways to run stuff at reboot/halt, but for service specific things like this it's better to group all related commands in a dedicated .conf file using run/task statements. I'll poke around a bit and get back to you on the "runs at startup" issue.

troglobit commented 1 year ago

I've looked into this now, it seems to be a very old bug/design mistake. Runlevel S is translated into 0, because of an array being used as data structure for tracking levels. That's why your shutdown task runs also at bootstrap.

I need to look into this in more detail to see if I can redesign it in a way that's still backwards compatible. Can't give you any timeline though, sorry.

troglobit commented 1 year ago

Fixed the issue of S and 0 being the same runlevel yesterday. Then I spent a while testing reboot/poweroff and all my ssh login sesions are properly closed, so I don't see the issue you mentioned. Finit has sig.c:do_shutdown(), which properly sends TERM to all remaining processes at reboot/poweroff:

https://github.com/troglobit/finit/blob/4d472273e1eb72d17547d38ee735a219140e9c89/src/sig.c#L289-L318

I a bit curious as to why that is not sufficient in your case. But then I don't know how you've set up Finit, maybe on top of an existing sysv init install using existing start/stop-scripts?

cornerfix commented 1 year ago

Thanks for fixing the issue, thats great :)

I am afraid I found 2 more small bugs. Will report them soon.

No, my finit installation is not on top of sysv init - it's in alpine linux and replaces openrc (via symlink /sbin/init -> finit)

If you want - I can easily publish this VM on external IP address and give you access so you can have a look ?

This may also help with the other 2 bugs (they are reproducible on this same VM).


P.S. I just sent VM login to your @gmail address

troglobit commented 1 year ago

OK, I usually don't do free consulting since expectations are high and my spare time very limited. But I can have a look later tonight.

Meanwhile, it'd be good if you could tell me a bit more about the installation. For instance, have you replaced the Busyxbox reboot/shutdown/halt program too with the Finit equivalents? How did your configure line look, and which version (git hash) are you using, is it still v4.3?

troglobit commented 1 year ago

OK I've logged in. Looks like Finit is not properly installed. Did you see this blog post?

There is also mention about this in the README, linking to this HowTo in the source tree: