troglobit / finit

Fast init for Linux. Cookies included
https://troglobit.com/projects/finit/
MIT License
633 stars 64 forks source link

Bug, <pid/SERVICE> condition not triggered without second reload #196

Closed hongkongkiwi closed 2 years ago

hongkongkiwi commented 3 years ago

I'm running finit on my embedded device (with only cgroups v1 support and kernel 4.9.1), I've made a small change to remove init/system/user groups from settings cpu.weight as I don't have that feature.

I'm having two issues:

Here's some debugging. In the initctl status screen, finit knows the apps are running:

PID  IDENT              STATUS   RUNLEVELS    DESCRIPTION
286  syslogd            running  [S12345-789] System log Daemon
291  earlyoom           running  [--2345-789] Early OOM Daemon
292  chronyd            running  [--2345-789] Chrony Time Daemon
307  mosquitto          running  [--2345-789] Local MQTT Broker Daemon
706  vnstatd            running  [--2345-789] VNStat Daemon
0    myapp              ready    [--2345-789] MyApp

ps shows the following:

  286 root     syslogd -F -P /var/run/syslogd.pid -4 -k -r 200k:5 -n -s -s
  291 root     earlyoom
  292 root     chronyd -n -4 -f /etc/chrony.conf
  307 mosquitt mosquitto -c /etc/mosquitto/mosquitto.conf
  706 nobody   vnstatd -n --sync --user nobody --group nobody --config /etc/vnstat.conf

initctl cond dump shows nothing running initially even though finit shows it as running in the above screen

PID  IDENT              STATUS  CONDITION
........ snip .........
1    init               on      <net/lo/exist>
0    static             on      <usr/my_custom_cond>
1    static             on      <hook/svc/up>
........ snip .........

initctl cond shows that we are waiting on mosquitto to run some apps (even though mosquitto is running as above)

PID  IDENT      STATUS  CONDITION (+ ON, ~ FLUX, - OFF)
307  mosquitto  on      <+net/lo/exist>
0    myapp      off     <-pid/mosquitto>
........ snip .........

Now when I run initctl reload && initctl dump apps running as non-root will show up, in this case vnstatd & mosquitto but chrony/earlyoom/syslogd do not show up

PID   IDENT              STATUS  CONDITION
....... snip ........
1     init               on      <net/lo/exist>
0     static             on      <usr/net_wlan0_ap>
706   vnstatd            on      <pid/vnstatd>
307   mosquitto          on      <pid/mosquitto>
800   myapp              on      <+pid/mosquitto>
1     static             on      <hook/svc/up>
....... snip ........

finit.conf service definitions are like this:

service [2345789] <net/lo/exist> env:/etc/conf.d/mosquitto mosquitto $MOSQUITTO_OPTIONS -- Local MQTT Broker Daemon
service [2345789] pid:earlyoom.pid env:/etc/conf.d/earlyoom earlyoom $EARLYOOM_OPTIONS -- Early OOM Daemon
service [2345789] env:/etc/conf.d/chronyd chronyd -n $CHRONYD_OPTIONS -- Chrony Time Daemon
service [2345789] <net/wlan0/exist,net/wwan0/exist> env:/etc/conf.d/vnstatd vnstatd -n $VNSTATD_OPTIONS --config "$VNSTATD_CONFIG_FILE" -- VNStat Daemon
service [2345789] <pid/mosquitto> env:/etc/conf.d/myapp myapp -- MyApp

Any ideas what might be causing this?

troglobit commented 3 years ago

Finit relies on services creating, and touching, their PID files on startup and reload (SIGHUP). It monitors for PID files in /var/run, which on most systems today is a symlink to /run. Some services create sub-directories here, which is supported, but Finit only looks for *.pid and sub-dir/pid in these directories. If the services uses another pattern for their PID file, that is not caught by Finit and the PID condition is not asserted.

I can't really see why it starts working for you after a reload, not without setting up a similar case as you to try and reproduce.

I know it's no comfort to you, but we run all our applications as root and have none of these issues.

troglobit commented 2 years ago

Ah, just hit me. Finit relies heavily on inotify to detect changes in .conf, add/remove of .pid files, etc. Is that missing from your kernel perhaps?

hongkongkiwi commented 2 years ago

I do have INOTIFY_USER selected in my kernel, just checked, is there any other kernel setting I need to check? How would I debug this?

Just pulled latest version (0222feeda19bb9576186423c03a05337ef390f88), seems i'm still having the same issue.

Here's a small sample file:

service [S12345] /usr/bin/earlyoom -- EarlyOOM daemon
service [S12345] pid:syslogd.pid /sbin/syslogd -D -S -n -f /etc/syslog.conf -O /tmp/messages -- System log daemon
service [S12345] <pid/syslogd> /sbin/klogd -n -c 1 -- Kernel log daemon

initctl status

PID   IDENT           STATUS   RUNLEVELS    DESCRIPTION
725   tty:S0          running  [-12345----] Getty on /dev/ttyS0
724   watchdog:finit  running  [-123456789] Finit watchdog daemon
721   dbus-daemon     running  [S12345-789] D-Bus message bus daemon
1150  earlyoom        running  [S12345----] EarlyOOM daemon
1151  syslogd         running  [S12345----] System log daemon
0     klogd           ready    [S12345----] Kernel log daemon

initctl status syslogd

     Status : running
   Identity : syslogd
Description : System log daemon
     Origin : /etc/finit.d/early-system.conf
Environment :
Condition(s):
    Command : /sbin/syslogd -D -S -n -f /etc/syslog.conf -O /tmp/messages
   PID file : /run/syslogd.pid
        PID : 1151
       User : root
      Group : root
     Uptime : 31 sec
   Restarts : 0 (0/10)
  Runlevels : [S12345----]

initctl cond

PID   IDENT           STATUS  CONDITION (+ ON, ~ FLUX, - OFF)
0     klogd           off     <-pid/syslogd>
cat /run/syslogd.pid
1151

So the pid is there..... And it's even created by finit itself, but the <pid/syslogd> seems to never pick it up! :/

troglobit commented 2 years ago

Ah, this latest report is quite different from the original you posted, right? In the original it was the PID condition for mosquitto that wasn't asserted, here it's the Finit-generated PID condition for syslogd (BusyBox I assume), which could be a problem in Finit, I'll have to set up a test and verify myself. I'll put that on the the TODO for the next release.

Did you ever solve the original problem with mosquitto, was it that it didn't touch it's PID file on reload perhaps?

hongkongkiwi commented 2 years ago

OK, I've just done more step-by-step testing and I've narrowed down why syslogd & mosquitto behave differently.

Lets take the simple config line below for mosquitto.... this doesn't work. on my system, it has the same issue as syslogd. I can see that a pid is created in /run/mosquitto.pid but finit never picks it up (no matter how many times I initctl reload). I've also tried just touch /run/mosquitto.pid but no luck.

# not showing up in initctl cond dump
service [S12345] mosquitto -c /etc/mosquitto.conf -- Mosquitto daemon

HOWEVER, if I add the precondition I was using before <net/lo/exist> things work just fine but only after the second reload.

# showing up in initctl cond dump after second initctl reload
service [S12345] <net/lo/exist> mosquitto -c /etc/mosquitto.conf -- Mosquitto daemon

Here's a pastebin and you can see it in action: https://pastebin.com/tgZ8ckwh

To double check this was the issue, I could replicate the same behaviour with busybox syslogd (although in that case, I made a quick wrapper script to write out the pid just to make sure it wasn't something else).

I should note that in either case, the service runs fine, it's just that finit doesn't register the pid condition in the first case. Strangely though, finit knows it's running when using initctl status mosquitto.

Forget what I said before about root vs user, that doesn't seem to make any difference so I was mistaken there.

hongkongkiwi commented 2 years ago

Do you think this could be the issue?

#define IN_MASK_CREATE 0x10000000 / since Linux 4.18 /

I am running kernel 4.9.0 so this IN_MASK_CREATE feature is not in my version.

Perhaps I can patch in this feature if it's the problem? :)

troglobit commented 2 years ago

Aha! Yeah very likely that's the issue! Modern Finit (3.0 and later iirc) is really made for "modern" kernels, we've run Linux 4.19 since what feels like forever, so this assumption is definitely related to that.

How do you propose to patch it, patch the kernel?

hongkongkiwi commented 2 years ago

Awesome! I've solved the issue with a patch. It turns out it was pretty simple, here's my pastebin for it. You could include it with a note for older kernels, I'm very excited! finit is exactly what I've been looking for thank you for your hard work.

troglobit commented 2 years ago

@hongkongkiwi great work! I've added it to a contrib/patches subdirectory of the tree, referencing this issue for future devs, thanks!

Glad to hear you like it! It's been a bit of a labor of love for me, so it's amazing to hear other ppl enjoying it as well <3

I'll try to catch up with the rest of the issues and outstanding things during the weekend, the idea is to get another release out soonish.