naemon / naemon-core

Networks, Applications and Event Monitor
http://www.naemon.io/
GNU General Public License v2.0
153 stars 63 forks source link

signal handling changes for 1.0.4 - naemon exits on USR1 to the master process #138

Closed stromsoe closed 8 years ago

stromsoe commented 8 years ago

I updated to naemon 1.0.4 on an up to date Debian jessie using packages from http://labs.consol.de/repo/stable/debian. I'm seeing naemon die every morning after log rotation completes. I have verified that if I send SIGUSR1 to the master process, it goes through the shutdown process and exits. I tried to attach to the main process with strace, but that also caused it to go through the shutdown process.

The last entries in naemon.log after it exits are:

[1465237272] Auto-save of retention data completed successfully. [1465237280] EXTERNAL COMMAND: SCHEDULE_HOST_CHECK;guesthouse.sw51f3;1465237281 [1465237379] Error: Polling for input on 0x12cd150 failed: Interrupted system call [1465237381] Event broker module 'NERD' deinitialized successfully. [1465237381] livestatus: deinitializing [1465237381] Event broker module '/usr/lib/naemon/naemon-livestatus/livestatus.so' deinitialized successfully.

stromsoe commented 8 years ago

In event_poll_full():

    inputs = iobroker_poll(iobs, time_diff);
    if (inputs < 0) {
            if (errno == EAGAIN) {
                    /*
                    * errno is EINTR, which means it isn't a timed event, thus don't
                    * continue below
                    */
                    return 0;
            } else {
                    nm_log(NSLOG_RUNTIME_ERROR, "Error: Polling for input on %p failed: %s", iobs, iobroker_strerror(inputs));
                    return -1;
            }
    }

The comment references EINTR, which matches what errno is being set to in this particular case, but the test is against EAGAIN. Is the test wrong?

catharsis commented 8 years ago

@stromsoe Yes, definitely. It's a stupid error on my part, introduced in 21c9601390a2a029b7bc4c2e48416c7eaf1960ff. Good find, and thanks for reporting this.

catharsis commented 8 years ago

Fixed by 6616e39.

vdrandom commented 8 years ago

Any plans to release the fix? I know it's possible to work around with HUP, but USR1 is sent by default in the logrotate config supplied by the official debian package.

catharsis commented 8 years ago

@vdrandom Yes, I really think this motivates a new release, I talked to @sni last week about making that happen on friday. Other stuff got in the way, however, so it's been delayed. Hopefully we'll be able to do a release either this week or the next.