troglobit / finit

Fast init for Linux. Cookies included
https://troglobit.com/projects/finit/
MIT License
622 stars 61 forks source link

<service/ready> condition not reasserted after crash. #343

Closed JackNewman12 closed 1 year ago

JackNewman12 commented 1 year ago

If a service crashes the ready condition does not get set. Causing other daemons that rely on it to be paused.

# Start Service 
<service/running> is set
<service/ready> is set

# Crash the service in the background using kill command or similar
<service/running> is cleared
<service/ready> is cleared 

service_retry() will eventually print "Successfully restarted crashing service"
svc_set_state() clears all the conditions but then only sets "running". 

Which results in:
<service/running> is set
<service/ready> is cleared

This also occurs for systemd ready flags (the use case that I care about):

<service/running> is set
<service/ready> is set

# Service crashes. 
<service/running> is cleared
<service/ready> is cleared 

# Daemon asserts READY=1
<service/ready> gets set

# Service restart timeout triggers 
svc_set_state() clears all the conditions and only reapplies <service/running>

The fix isn't obvious to me on this one. I feel like the service is going to need to keep track of its ready state somewhere, or perhaps if svc->notify then don't clear the ready on a svc_set_state(RUNNING).

Edit: Wild thought. You could add an additional SVC_STATE too. `Halted->Waiting->Preparing (only valid for notif services) -> Running"

JackNewman12 commented 1 year ago

I think there is also an issue with just a initctl reload. The <ready> conditions get set to flux but my daemon never received a sighup. If I manually send a HUP to my daemon or do a initctl reload mydaemon it will resend the READY=1 flag and the <ready> cond becomes set again.

troglobit commented 1 year ago

Interesting, I'll have to create a test to reproduce this. We already have three tests, but it would seem neither of them catch this.

Could you please add some information on how you set the service up, just to rule out user-error.

JackNewman12 commented 1 year ago

Messed around my MyLinux tonight to hopefully help reproduce this without having to deal with censoring logs. Running master finit.

Minimal code to reproduce:

# cat services.conf 
service notify:systemd /usr/bin/main
service <service/main/ready> /usr/bin/second

# cat main.c
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

void notify(void) {
    const char *sock = getenv("NOTIFY_SOCKET");
    int do_notify;

    if (sock)
        do_notify = atoi(sock);

    if (do_notify)
        write(do_notify, "READY=1\n", 8);
}

void sig_handler(int signum) {
    notify();
}

void main() {
    signal(SIGHUP, sig_handler);
    notify();
    while (1) {
        sleep(10);
    }
}

# cat second.c 
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

void main() {
    while (1) {
        sleep(10);
    }
}

Some logs that show off the behaviour.

# initctl ; initctl cond
PID  IDENT    STATUS   RUNLEVELS    DESCRIPTION                                                                                         
85   main     running  [--234-----] main
96   second   running  [--234-----] second
PID  IDENT    STATUS  CONDITION (+ ON, ~ FLUX, - OFF)                                                                                   
96   second   on      <+service/main/ready>
# 
# 
# initctl reload
# 
# initctl ; initctl cond
PID  IDENT    STATUS   RUNLEVELS    DESCRIPTION                                                                                         
85   main     running  [--234-----] main
96   second   paused   [--234-----] second
PID  IDENT    STATUS  CONDITION (+ ON, ~ FLUX, - OFF)                                                                                   
96   second   flux    <~service/main/ready>
# 
# kill -HUP 85
# initctl ; initctl cond
PID  IDENT    STATUS   RUNLEVELS    DESCRIPTION                                                                                         
85   main     running  [--234-----] main
96   second   running  [--234-----] second
PID  IDENT    STATUS  CONDITION (+ ON, ~ FLUX, - OFF)                                                                                   
96   second   on      <+service/main/ready>

Now this is a little tricker to show over text. But if you spam initctl cond you can see ready gets set, then clears after the 2 or 5 second timeout.

# killall main
# killall main
# killall main

#### Condition goes down correctly.
# initctl ; initctl cond
PID  IDENT    STATUS   RUNLEVELS    DESCRIPTION                                                                                         
0    main     restart  [--234-----] main
0    second   waiting  [--234-----] second
PID  IDENT    STATUS  CONDITION (+ ON, ~ FLUX, - OFF)                                                                                   
0    second   off     <-service/main/ready>

##### Condition gets set correctly
# initctl ; initctl cond
PID  IDENT    STATUS   RUNLEVELS    DESCRIPTION                                                                                         
132  main     running  [--234-----] main
133  second   running  [--234-----] second
PID  IDENT    STATUS  CONDITION (+ ON, ~ FLUX, - OFF)                                                                                   
133  second   on      <+service/main/ready>

### After the X000ms timeout, the condition gets cleared
# initctl ; initctl cond
PID  IDENT    STATUS   RUNLEVELS    DESCRIPTION                                                                                         
132  main     running  [--234-----] main
0    second   waiting  [--234-----] second
PID  IDENT    STATUS  CONDITION (+ ON, ~ FLUX, - OFF)                                                                                   
0    second   off     <-service/main/ready>

### If the ready notification gets retriggered it continues to work as normal
# kill -HUP 132
# initctl ; initctl cond
PID  IDENT    STATUS   RUNLEVELS    DESCRIPTION                                                                                         
132  main     running  [--234-----] main
138  second   running  [--234-----] second
PID  IDENT    STATUS  CONDITION (+ ON, ~ FLUX, - OFF)                                                                                   
138  second   on      <+service/main/ready>
troglobit commented 1 year ago

Amazing, thank you so much! <3

troglobit commented 1 year ago

I have an initial patch, but it breaks four other test cases, so I'll have to spend a bit more time on it. Will probably take a few days to finalize, just fyi.

troglobit commented 1 year ago

I've pushed the revised patch to a separate branch, and will test it manually in my own setups before merging it to master. If you like, you can try it out as well. The tests now pass, including the updated notify.sh, so I'm fairly confident it'll work in a real-world setup as well.

JackNewman12 commented 1 year ago

Some testing on my machine. Almost perfect!

# initctl ; initctl cond;
PID  IDENT   STATUS   RUNLEVELS    DESCRIPTION
59   main    running  [--234-----] main
60   second  running  [--234-----] second
43   tty:S0  running  [-12345----] Getty on /dev/ttyS0
PID  IDENT   STATUS  CONDITION (+ ON, ~ FLUX, - OFF)
60   second  on      <+service/main/ready>

Working: Restarting a service

# initctl restart main
Jan 13 08:14:59 finit[1]: Stopping main[59], sending SIGTERM ...
Jan 13 08:14:59 finit[1]: Stopping second[60], sending SIGTERM ...
# Jan 13 08:14:59 finit[1]: Starting main[67]
Jan 13 08:14:59 finit[1]: Starting second[68]
# initctl ; initctl cond;
PID  IDENT   STATUS   RUNLEVELS    DESCRIPTION
67   main    running  [--234-----] main
68   second  running  [--234-----] second
43   tty:S0  running  [-12345----] Getty on /dev/ttyS0
PID  IDENT   STATUS  CONDITION (+ ON, ~ FLUX, - OFF)
68   second  on      <+service/main/ready>

Working: reloading a single service

# initctl reload main
Jan 13 08:15:18 finit[1]: Restarting main[67], sending SIGHUP ...
# initctl ; initctl cond;
PID  IDENT   STATUS   RUNLEVELS    DESCRIPTION
67   main    running  [--234-----] main
68   second  running  [--234-----] second
43   tty:S0  running  [-12345----] Getty on /dev/ttyS0
PID  IDENT   STATUS  CONDITION (+ ON, ~ FLUX, - OFF)
68   second  on      <+service/main/ready>

Not Working: reloading finit I think the error here is clearing the ready condition since I don't think HUP'ing finit should HUP all services.

# initctl reload
# initctl ; initctl cond;
PID  IDENT   STATUS   RUNLEVELS    DESCRIPTION
67   main    running  [--234-----] main
68   second  paused   [--234-----] second
43   tty:S0  running  [-12345----] Getty on /dev/ttyS0
PID  IDENT   STATUS  CONDITION (+ ON, ~ FLUX, - OFF)
68   second  flux    <~service/main/ready>
# kill -HUP 67
# initctl ; initctl cond;
PID  IDENT   STATUS   RUNLEVELS    DESCRIPTION
67   main    running  [--234-----] main
68   second  running  [--234-----] second
43   tty:S0  running  [-12345----] Getty on /dev/ttyS0
PID  IDENT   STATUS  CONDITION (+ ON, ~ FLUX, - OFF)
68   second  on      <+service/main/ready>

Working: Crashing the application

# killall main
Jan 13 08:15:56 finit[1]: Stopping second[68], sending SIGTERM ...
# Jan 13 08:15:56 finit[1]: Service main[67] died, restarting in 2000 msec (1/10)
Jan 13 08:15:56 finit[1]: Starting main[80]
Jan 13 08:15:56 finit[1]: Starting second[81]
# killall main
Jan 13 08:15:57 finit[1]: Stopping second[81], sending SIGTERM ...
Jan 13 08:15:59 finit[1]: Service main[80] died, restarting in 2000 msec (2/10)
Jan 13 08:15:59 finit[1]: Starting main[86]
Jan 13 08:15:59 finit[1]: Starting second[87]
Jan 13 08:16:01 finit[1]: Successfully restarted crashing service main.
# killall main
Jan 13 08:16:01 finit[1]: Stopping second[87], sending SIGTERM ...
# Jan 13 08:16:01 finit[1]: Service main[86] died, restarting in 2000 msec (3/10)
Jan 13 08:16:01 finit[1]: Starting main[89]
Jan 13 08:16:01 finit[1]: Starting second[90]
Jan 13 08:16:03 finit[1]: Successfully restarted crashing service main.
# initctl ; initctl cond;
PID  IDENT   STATUS   RUNLEVELS    DESCRIPTION
89   main    running  [--234-----] main
90   second  running  [--234-----] second
43   tty:S0  running  [-12345----] Getty on /dev/ttyS0
PID  IDENT   STATUS  CONDITION (+ ON, ~ FLUX, - OFF)
90   second  on      <+service/main/ready>
troglobit commented 1 year ago

OK, thanks! I'll have a look at it first thing tomorrow.

Just a few things that may help explain a bit:

Now, readiness notification is new in this release, and from a quick glance it seems I've missed handling the reload case entirely. Because on reload the configuration "generation" is stepped, and conditions in the old generation will be in "flux". After reload all unmodified services should simply have their ready condition stepped automatically.

troglobit commented 1 year ago

Have reproduced this now. It is caused by 1) the service not creating a pid file, which my test services do, causing it to never be marked as started even, and 2) ready not being reasserted at all for systemd style services.

troglobit commented 1 year ago

Took a while to get this one right, but I believe Finit now covers all corner cases properly.

Couldn't reproduce with my test cases at first, had to do it manually. The root cause of that was initctl cond get service/main/ready returning on instead of flux, tricking my test. So more than one problem fixed thanks to this issue.

JackNewman12 commented 1 year ago

Looks good to me! I can't seem to break it 😁 I'll let you know if anything misbehaves next week. Thanks!

troglobit commented 1 year ago

Awesome! :sunglasses:

Thank you for taking the time to both report and follow-up on this, it is very appreciated!

JackNewman12 commented 1 year ago

Ahh I think I might have found another edgecase.

  1. stop/start daemon or crash it -> causes ready to be cleared (or it was never posted in the first place)
  2. daemon has NOT posted READY=1. In this case it is waiting on hardware.
  3. initctl reload -> daemon/ready gets asserted.

Replacing the initctl reload with initctl reload daemon does not have the same issue, and also seems to stop it happening in the future.

  1. stop/start daemon or crash it -> causes daemon/ready to be cleared (or it was never posted in the first place)
  2. daemon has NOT posted READY=1. In this case it is waiting on hardware.
  3. initctl reload daemon -> daemon/ready is NOT asserted
  4. initctl reload -> daemon/ready is NOT asserted.

Note that is particular daemon creates a PID file as it starts up, but won't READY=1 until the hardware it talks to is actually ready. I'll try and get a minimal code example for this tonight,

troglobit commented 1 year ago

Hmm, OK not really following, but very interested to see.

I tried to reproduce with your previous example, but on initctl reload it receives a HUP and does send READY=1, so likely not your edge case.

For me, both of the sequences generate the same behavior in fact.

JackNewman12 commented 1 year ago

Off the top of my head if you disable the notify() function, you should be able to reproduce the issue. I'll have a go at reproducing in a few hours.

troglobit commented 1 year ago

Aha, yeah that would be a neat edge case! However, still cannot reproduce, sorry.

JackNewman12 commented 1 year ago

Yeah looks like it is due to also having a pid file present.

cat main.c

#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

void createPidFile() {
    // Create the pid file to stop other copies running and let the system know we are running, but we may not be 'ready' yet.
    FILE *pidFile = fopen("/var/run/main.pid", "w");
    fprintf( pidFile, "%ld", ( long ) getpid() );
    fclose(pidFile);
}

void notify(void) {
    return; // Disabled - i.e. waiting for hardware, could take X minutes before hardware is ready and child services are allow to connect in a start doing work.  
    const char *sock = getenv("NOTIFY_SOCKET");
    int do_notify;

    if (sock)
        do_notify = atoi(sock);

    if (do_notify)
        write(do_notify, "READY=1\n", 8);
}

void sig_handler(int signum) {
    notify();
}

void main() {
    createPidFile();
    signal(SIGHUP, sig_handler);
    notify();
    while (1) {
        sleep(10);
    }
}
# initctl debug
finit[1]: Debug mode enabled
Jan 16 10:08:35 noname daemon.notice finit[1]: Debug mode enabled
#
# initctl
finit[1]: api_cb():get runlevel
Jan 16 10:08:38 noname daemon.debug finit[1]: api_cb():get runlevel
PID  IDENT    STATUS   RUNLEVELS    DESCRIPTION
41   main     running  [--234-----] main
0    second   waiting  [--234-----] second
43   tty:1    running  [-12345----] Getty on tty1
44   tty:1    running  [-12345----] Getty on ttyS0
39   syslogd  running  [S123456789] System log daemon
0    klogd    waiting  [S123456789] Kernel log daemon
# initctl cond
PID  IDENT    STATUS  CONDITION (+ ON, ~ FLUX, - OFF)
0    second   off     <-service/main/ready>
0    klogd    off     <-pid/syslogd>
#
#
#
#
# initctl reload
finit[1]: api_cb():reload
finit[1]: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 1
finit[1]: sm_step():state: reload/change, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit[1]: conf_reload():Set time  daylight: 0  timezone: 0  tzname: UTC
Jan 16 10:08:47 noname daemon.debug finit[1]: api_cb():reload
Jan 16 10:08:47 noname daemon.debug finit[1]: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 1
Jan 16 10:08:47 noname daemon.debug finit[1]: sm_step():state: reload/change, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
Jan 16 10:08:47 noname daemon.debug finit[1]: conf_reload():Set time  daylight: 0  timezone: 0  tzname: UTC
finit[1]: parse_conf():*** Parsing /etc/finit.conf
finit[1]: parse_conf():*** Parsing /etc/finit.d/main.conf
finit[1]: service_register():Found existing svc for /usr/bin/main name main id  type 1
finit[1]: service_register():Service main runlevel 0x1c
finit[1]: service_register():Found existing svc for /usr/bin/second name second id  type 1
finit[1]: service_register():Service second runlevel 0x1c
Jan 16 10:08:47 noname daemon.debug finit[1]: parse_conf():*** Parsing /etc/finit.conf
Jan 16 10:08:47 noname daemon.debug finit[1]: parse_conf():*** Parsing /etc/finit.d/main.conf
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Found existing svc for /usr/bin/main name main id  type 1
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Service main runlevel 0x1c
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Found existing svc for /usr/bin/second name second id  type 1
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Service second runlevel 0x1c
finit[1]: devmon_add_cond():no match
finit[1]: parse_conf():*** Parsing /etc/finit.d/enabled/getty.conf
finit[1]: tty_parse_args():Registering built-in getty on TTY @console at 0 baud with term N/A
Jan 16 10:08:47 noname daemon.debug finit[1]: devmon_add_cond():no match
Jan 16 10:08:47 noname daemon.debug finit[1]: parse_conf():*** Parsing /etc/finit.d/enabled/getty.conf
Jan 16 10:08:47 noname daemon.debug finit[1]: tty_parse_args():Registering built-in getty on TTY @console at 0 baud with term N/A
finit[1]: tty_atcon():consoles: tty1 ttyS0
Jan 16 10:08:47 noname daemon.debug finit[1]: tty_atcon():consoles: tty1 ttyS0
finit[1]: service_register():Found existing svc for /libexec/finit/getty name @console id  type 8
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Found existing svc for /libexec/finit/getty name @console id  type 8
finit[1]: service_register():Service tty:1 runlevel 0x3e
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Service tty:1 runlevel 0x3e
finit[1]: service_register():Found existing svc for /libexec/finit/getty name @console id 1 type 8
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Found existing svc for /libexec/finit/getty name @console id 1 type 8
finit[1]: service_register():Service tty:1 runlevel 0x3e
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Service tty:1 runlevel 0x3e
finit[1]: /libexec/finit/getty (tty:1) asserts the same condition as /libexec/finit/getty (tty:1) => pid/tty:1
Jan 16 10:08:47 noname daemon.warn finit[1]: /libexec/finit/getty (tty:1) asserts the same condition as /libexec/finit/getty (tty:1) => pid/tty:1
finit[1]: tty_parse_args():Incomplete or non-existing TTY device given, cannot register.
finit[1]: parse_conf():*** Parsing /etc/finit.d/enabled/syslogd.conf
finit[1]: service_register():Found existing svc for syslogd name syslogd id  type 1
finit[1]: service_register():Service syslogd runlevel 0x3ff
finit[1]: service_register():Found existing svc for klogd name klogd id  type 1
Jan 16 10:08:47 noname daemon.err finit[1]: tty_parse_args():Incomplete or non-existing TTY device given, cannot register.
Jan 16 10:08:47 noname daemon.debug finit[1]: parse_conf():*** Parsing /etc/finit.d/enabled/syslogd.conf
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Found existing svc for syslogd name syslogd id  type 1
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Service syslogd runlevel 0x3ff
finit[1]: service_register():Service klogd runlevel 0x3ff
finit[1]: devmon_add_cond():no match
finit[1]: sm_step():Stopping services not allowed after reconf ...
finit[1]: cond_reload():
finit[1]: service_step():                main(  41):  running  enabled/clean   cond:on
finit[1]: service_step():              second(   0):  waiting  enabled/clean   cond:off
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Found existing svc for klogd name klogd id  type 1
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Service klogd runlevel 0x3ff
Jan 16 10:08:47 noname daemon.debug finit[1]: devmon_add_cond():no match
Jan 16 10:08:47 noname daemon.debug finit[1]: sm_step():Stopping services not allowed after reconf ...
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_reload():
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():                main(  41):  running  enabled/clean   cond:on
finit[1]: service_step():               tty:1(  43):  running  enabled/clean   cond:on
finit[1]: service_step():               tty:1(  44):  running  enabled/clean   cond:on
finit[1]: service_step():             syslogd(  39):  running  enabled/clean   cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():              second(   0):  waiting  enabled/clean   cond:off
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():               tty:1(  43):  running  enabled/clean   cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():               tty:1(  44):  running  enabled/clean   cond:on
finit[1]: service_step():               klogd(   0):  waiting  enabled/clean   cond:off
finit[1]: sm_step():state: reload/wait, runlevel: 2, newlevel: -1, teardown: 1, reload: 0
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():             syslogd(  39):  running  enabled/clean   cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():               klogd(   0):  waiting  enabled/clean   cond:off
finit[1]: sm_step():Starting services after reconf ...
finit[1]: service_step():                main(  41):  running  enabled/clean   cond:on
finit[1]: service_step():              second(   0):  waiting  enabled/clean   cond:off
finit[1]: service_step():               tty:1(  43):  running  enabled/clean   cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: sm_step():state: reload/wait, runlevel: 2, newlevel: -1, teardown: 1, reload: 0
Jan 16 10:08:47 noname daemon.debug finit[1]: sm_step():Starting services after reconf ...
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():                main(  41):  running  enabled/clean   cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():              second(   0):  waiting  enabled/clean   cond:off
finit[1]: service_step():               tty:1(  44):  running  enabled/clean   cond:on
finit[1]: service_step():             syslogd(  39):  running  enabled/clean   cond:on
finit[1]: service_step():               klogd(   0):  waiting  enabled/clean   cond:off
finit[1]: sm_step():Calling reconf hooks ...
finit[1]: plugin_run_hook():Calling netlink hook n:o 8 (arg: 0x55aa58d56630) ...
finit[1]: cond_reassert():net/
finit[1]: do_assert():Reasserting /run/finit/cond/net/lo/running => /net/lo/running
finit[1]: cond_set():/net/lo/running
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():               tty:1(  43):  running  enabled/clean   cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():               tty:1(  44):  running  enabled/clean   cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():             syslogd(  39):  running  enabled/clean   cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():               klogd(   0):  waiting  enabled/clean   cond:off
Jan 16 10:08:47 noname daemon.debug finit[1]: sm_step():Calling reconf hooks ...
finit[1]: cond_set_noupdate():/net/lo/running
finit[1]: cond_set_path():/run/finit/cond//net/lo/running <= 2
finit[1]: cond_update():/net/lo/running
finit[1]: do_assert():Reasserting /run/finit/cond/net/lo/up => /net/lo/up
Jan 16 10:08:47 noname daemon.debug finit[1]: plugin_run_hook():Calling netlink hook n:o 8 (arg: 0x55aa58d56630) ...
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_reassert():net/
Jan 16 10:08:47 noname daemon.debug finit[1]: do_assert():Reasserting /run/finit/cond/net/lo/running => /net/lo/running
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set():/net/lo/running
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_noupdate():/net/lo/running
finit[1]: cond_set():/net/lo/up
finit[1]: cond_set_noupdate():/net/lo/up
finit[1]: cond_set_path():/run/finit/cond//net/lo/up <= 2
finit[1]: cond_update():/net/lo/up
finit[1]: do_assert():Reasserting /run/finit/cond/net/lo/exist => /net/lo/exist
finit[1]: cond_set():/net/lo/exist
finit[1]: cond_set_noupdate():/net/lo/exist
finit[1]: cond_set_path():/run/finit/cond//net/lo/exist <= 2
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_path():/run/finit/cond//net/lo/running <= 2
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_update():/net/lo/running
Jan 16 10:08:47 noname daemon.debug finit[1]: do_assert():Reasserting /run/finit/cond/net/lo/up => /net/lo/up
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set():/net/lo/up
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_noupdate():/net/lo/up
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_path():/run/finit/cond//net/lo/up <= 2
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_update():/net/lo/up
finit[1]: cond_update():/net/lo/exist
finit[1]: plugin_run_hook():Calling pidfile hook n:o 8 (arg: 0x55aa58d56630) ...
finit[1]: cond_set_path():/run/finit/cond/pid/main <= 2
finit[1]: service_step():                main(  41):  running  enabled/clean   cond:on
finit[1]: service_step():              second(   0):  waiting  enabled/clean   cond:off
finit[1]: service_step():             syslogd(  39):  running  enabled/clean   cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: do_assert():Reasserting /run/finit/cond/net/lo/exist => /net/lo/exist
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set():/net/lo/exist
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_noupdate():/net/lo/exist
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_path():/run/finit/cond//net/lo/exist <= 2
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_update():/net/lo/exist
finit[1]: service_step():               klogd(   0):  waiting  enabled/clean   cond:off
finit[1]: cond_set_oneshot():nop
finit[1]: sm_step():Update configuration generation of unmodified non-native services ...
finit[1]: cond_set():service/main/ready
finit[1]: cond_set_noupdate():service/main/ready
finit[1]: cond_set_path():/run/finit/cond/service/main/ready <= 2
finit[1]: cond_update():service/main/ready
Jan 16 10:08:47 noname daemon.debug finit[1]: plugin_run_hook():Calling pidfile hook n:o 8 (arg: 0x55aa58d56630) ...
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_path():/run/finit/cond/pid/main <= 2
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():                main(  41):  running  enabled/clean   cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():              second(   0):  waiting  enabled/clean   cond:off
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():             syslogd(  39):  running  enabled/clean   cond:on
finit[1]: cond_update():service/main/ready: match <service/main/ready> second(/usr/bin/second)
finit[1]: service_step():              second(   0):  waiting  enabled/clean   cond:on
finit[1]: cond_clear():service/second/
finit[1]: cond_clear_noupdate():service/second/
finit[1]: cond_set_path():/run/finit/cond/service/second/ <= 0
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():               klogd(   0):  waiting  enabled/clean   cond:off
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_oneshot():nop
Jan 16 10:08:47 noname daemon.debug finit[1]: sm_step():Update configuration generation of unmodified non-native services ...
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set():service/main/ready
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_noupdate():service/main/ready
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_path():/run/finit/cond/service/main/ready <= 2
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_update():service/main/ready
finit[1]: service_step():              second(   0): -> starting
finit[1]: service_step():              second(   0): starting  enabled/clean   cond:on
finit[1]: Starting second[55]
finit[1]: cond_clear():service/second/
finit[1]: cond_clear_noupdate():service/second/
finit[1]: cond_set_path():/run/finit/cond/service/second/ <= 0
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_update():service/main/ready: match <service/main/ready> second(/usr/bin/second)
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():              second(   0):  waiting  enabled/clean   cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_clear():service/second/
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_clear_noupdate():service/second/
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_path():/run/finit/cond/service/second/ <= 0
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():              second(   0): -> starting
finit[1]: cond_set_oneshot():service/second/running
finit[1]: cond_set_oneshot_noupdate():service/second/running => /run/finit/cond/service/second/running
finit[1]: cond_update():service/second/running
finit[1]: service_step():              second(  55): ->  running
finit[1]: service_step():              second(  55):  running  enabled/clean   cond:on
finit[1]: service_step():                main(  41):  running  enabled/clean   cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():              second(   0): starting  enabled/clean   cond:on
Jan 16 10:08:47 noname <117> finit[1]: Starting second[55]
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_clear():service/second/
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_clear_noupdate():service/second/
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_path():/run/finit/cond/service/second/ <= 0
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_oneshot():service/second/running
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_oneshot_noupdate():service/second/running => /run/finit/cond/service/second/running
finit[1]: service_step():              second(  55):  running  enabled/clean   cond:on
finit[1]: service_step():               tty:1(  43):  running  enabled/clean   cond:on
finit[1]: service_step():               tty:1(  44):  running  enabled/clean   cond:on
finit[1]: service_step():             syslogd(  39):  running  enabled/clean   cond:on
finit[1]: service_step():               klogd(   0):  waiting  enabled/clean   cond:off
finit[1]: sm_step():Reconfiguration done
finit[1]: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_update():service/second/running
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():              second(  55): ->  running
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():              second(  55):  running  enabled/clean   cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():                main(  41):  running  enabled/clean   cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():              second(  55):  running  enabled/clean   cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():               tty:1(  43):  running  enabled/clean   cond:on
# Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():               tty:1(  44):  running  enabled/clean   cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():             syslogd(  39):  running  enabled/clean   cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step():               klogd(   0):  waiting  enabled/clean   cond:off
Jan 16 10:08:47 noname daemon.debug finit[1]: sm_step():Reconfiguration done
Jan 16 10:08:47 noname daemon.debug finit[1]: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0

#
# initctl
finit[1]: api_cb():get runlevel
Jan 16 10:08:50 noname daemon.debug finit[1]: api_cb():get runlevel
PID  IDENT    STATUS   RUNLEVELS    DESCRIPTION
41   main     running  [--234-----] main
55   second   running  [--234-----] second
43   tty:1    running  [-12345----] Getty on tty1
44   tty:1    running  [-12345----] Getty on ttyS0
39   syslogd  running  [S123456789] System log daemon
0    klogd    waiting  [S123456789] Kernel log daemon
# initctl cond
PID  IDENT    STATUS  CONDITION (+ ON, ~ FLUX, - OFF)
55   second   on      <+service/main/ready>
0    klogd    off     <-pid/syslogd>
#
troglobit commented 1 year ago

Aaaargh, of course! I'll have a look at it later tonight ... awesome debugging + deduction there, thanks!

troglobit commented 1 year ago

Reproduced.

troglobit commented 1 year ago

There, fixed .. and finally I hope! Thank you so much for your patience and great help :heart: