Closed sni closed 8 years ago
That is peculiar... How is the command submitted? Via the queryhandler or the command pipe? If the former, do you get any failure results back? If the latter, could you try with the former and report back what results you get?
For what it's worth, I'm not able to reproduce this on Sles 11SP4 (I know, not the same), running in a docker:
sh-3.2# tail -f /var/log/naemon/naemon.log &
[1] 11385
sh-3.2# echo "[0] RESTART_PROGRAM" > /var/lib/naemon/naemon.cmd
sh-3.2# [1464064529] EXTERNAL COMMAND: RESTART_PROGRAM;
[1464064529] Event broker module 'NERD' deinitialized successfully.
[1464064529] Warning: enable_environment_macros is deprecated and will be removed.
[1464064529] Naemon 1.0.4-source starting... (PID=11331)
[1464064529] Local time is Tue May 24 04:35:29 UTC 2016
[1464064529] LOG VERSION: 2.0
[1464064529] qh: Socket '/var/lib/naemon/naemon.qh' successfully initialized
[1464064529] nerd: Channel hostchecks registered successfully
[1464064529] nerd: Channel servicechecks registered successfully
[1464064529] nerd: Fully initialized and ready to rock!
[1464064529] wproc: Successfully registered manager as @wproc with query handler
[1464064529] wproc: Registry request: name=Core Worker 11387;pid=11387
[1464064529] wproc: Registry request: name=Core Worker 11388;pid=11388
[1464064529] wproc: Registry request: name=Core Worker 11390;pid=11390
[1464064529] wproc: Registry request: name=Core Worker 11389;pid=11389
[1464064529] wproc: Registry request: name=Core Worker 11391;pid=11391
[1464064529] wproc: Registry request: name=Core Worker 11392;pid=11392
The command is submitted by Thruk via Livestatus which ends in the queryhandler. I also tried the command file or the query handler directly. All with the same result.
qh> #command run [0] RESTART_PROCESS
200: OK
qh>
@sni Just to clarify, neither of them causes Naemon to actually restart?
exactly
I wish I was able to reproduce this. I'll have a look at the code to see if I can get any hints, but I really doubt that will get me anywhere... Maybe I'll find some log level that could be enabled that might provide a clue.
Have you tried any of the previous builds? It would be very useful to get at least a hint of when the problem was introduced... If we could pinpoint an exact commit by doing a git bisect
or something, that would be even better ;)
edit: Oh, also - log_external_commands
is on right? But you're not seeing a [1464090888] EXTERNAL COMMAND: RESTART_PROGRAM;
in your naemon.log?
for the record...
log_external_commands is enabled and i can see the [1464095320] EXTERNAL COMMAND: RESTART_PROGRAM;
in the logs
This also affects the SHUTDOWN_PROGRAM
command, which makes me believe that this (somewhat surprisingly) has something to do with the event scheduler. I'll keep looking.
This is fixed by the inclusion of f10ee7b653d7d81cac1231856de3643effaeda05 and 21c9601390a2a029b7bc4c2e48416c7eaf1960ff.
On some machines the RESTART_PROCESS external command just does nothing.
The debug log does not contain any useful hints:
It basically looks the same on a machine where it is working. The error is reproducable on these machines, altought they do not seem to have any relation. The failed test vms are Fedora 23 (32bit) Sles 11SP2 (64bit) and Ubuntu16.04 (32bit). All other test VMs are ok.