Open mppace opened 8 years ago
I have this exact same issue and came here to open a ticket. CentOS release 6.8 (Final) 2.6.32-642.1.1.el6.x86_64
Had same issue on Debian GNU/Linux 8.5 (jessie)
.
It may have something to do with upgrading the config files from Nagios. I have been playing around with the config for awhile and got the seg fault to go away, but still not 100% working yet.
Try commenting out your code to find which part creates a segfault - happened to me too. I just removed the configuration until I found the offending parts and fixed them
Verify the notification commands configured for your contacts. More info: https://jira.op5.com/browse/MON-8843
TL;DR use tail -f /var/lib/naemon.debug
to find the offending parts.
Continuing my previous message above, also naemon/naemon#65
I did an upgrade from an old nagios server, I altered the config files and they now startup with no errors or warnings. It will run for some amount of time (5-10 mins) and seem to be working fine, and then just dies out of nowhere with Command file worker: Naemon main process is dead (No such process)
.
My guess is that something is wrong with the config files causing a segfault, however they load with no errs or warnings. I had previously received init warnings and segfault issues with config files, but those had been resolved with changing the config files to match more of the new naemon style. It does not seem to be a memory issue, as htop shows plenty of available memory.
I eventually found the problem by using tail -f /var/lib/naemon.debug
(may be in different location for different distros, I am on Debian8.). My normal logs would die with Command file worker: Naemon main process is dead (No such process)
and in my debug log it showed that it had started **** BEGIN MACRO PROCESSING ***********
got part way and never finished.
The issue it seems was in my notification commands (see below if curious). Something triggered that command, which worked fine on the old Nagios3 server, it got part way through sending an email and then segfaulted (wtf???). So I just made a much simpler version of the command and it works fine now.
define command{ command_name notify-by-email command_line /usr/bin/printf "%b" "Service $NOTIFICATIONTYPE$: $SERVICEDESC$ $SERVICESTATE$ on $HOSTNAME$ for $SERVICEDURATION$\n\nNotification Time: $LONGDATETIME$\nNotification Number: $NOTIFICATIONNUMBER$\n\nHostname: $HOSTNAME$\nAlias: $HOSTALIAS$\nIP Address: $HOSTADDRESS$\nService: $SERVICEDESC$\n\nService State: $SERVICESTATE$ (for $SERVICEDURATION$)\nHost State: $HOSTSTATE$ (for $HOSTDURATION$)\nService Check Output: $SERVICEOUTPUT$\nHost Check Output: $HOSTOUTPUT$\nServices In Warning State On This Host: $TOTALHOSTSERVICESWARNING$ of $TOTALHOSTSERVICES$\nServices In Critical State On This Host: $TOTALHOSTSERVICESCRITICAL$ of $TOTALHOSTSERVICES$\nServices In Unknown State On This Host: $TOTALHOSTSERVICESUNKNOWN$ of $TOTALHOSTSERVICES$\n\nHost Group States (0=Up, 1=Down, 2=Unreachable)\nCritical Server Host States: $HOSTSTATEID:Critical_Servers:$\n\nService Group States (0=OK, 1=Warning, 2=Critical, 3=Unknown)\nDNS Service States: $SERVICESTATEID:DNS:$\nLDAP Service States: $SERVICESTATEID:LDAP_Service:$\nMySQL Service States: $SERVICESTATEID:MYSQL:$\nMail Service States: $SERVICESTATEID:Mail:$\nNFS Service States: $SERVICESTATEID:NFS:$\nWeb Server Service States: $SERVICESTATEID:Web_Servers:$\n\nTotal Host Problems: $TOTALHOSTPROBLEMS$\nTotal Service Problems: $TOTALSERVICEPROBLEMS$\n-- Nagios Installation\nhttps://nagios.et.byu.edu" | /usr/bin/mail -s "Service $NOTIFICATIONTYPE$: $SERVICEDESC$ $SERVICESTATE$ on $HOSTNAME$ for $SERVICEDURATION$" $CONTACTEMAIL$ }
define command{ command_name notify-by-email-short command_line /usr/bin/printf "%b" "Host: $HOSTNAME$ ($SERVICEDESC$)\n$SERVICEOUTPUT$\nStatus: $SERVICESTATE$\n$DATE$/$TIME$" | sed -e 's/br \//\n/g' | /usr/bin/mail $CONTACTEMAIL$ }
Ok that's looks like a point.in the right direction. I'll take a look on my config and update further.
Is still an issue? If so, could you provide the command which caused the segfault?
I'm using naemon 1.0.5 on centos 7 and when i'm checking the config i get a segfault. looks like something to do with adding commands to a contact in the configuration, but the backtrace doesn't really help me in pointing out where the problem lies.
This is the backtrace output, any help would be greatly apprecitated.
gdb
wproc: Registry request: name=Core Worker 4338;pid=4338 wproc: Registry request: name=Core Worker 4339;pid=4339 wproc: Registry request: name=Core Worker 4340;pid=4340 wproc: Registry request: name=Core Worker 4341;pid=4341 wproc: Registry request: name=Core Worker 4342;pid=4342 wproc: Registry request: name=Core Worker 4343;pid=4343 wproc: Registry request: name=Core Worker 4344;pid=4344 wproc: Registry request: name=Core Worker 4345;pid=4345 wproc: Registry request: name=Core Worker 4347;pid=4347 wproc: Registry request: name=Core Worker 4346;pid=4346 wproc: Registry request: name=Core Worker 4348;pid=4348 wproc: Registry request: name=Core Worker 4349;pid=4349
Program received signal SIGSEGV, Segmentation fault. 0x00007ffff79528c5 in add_host_notification_command_to_contact () from /usr/lib64/naemon/libnaemon.so.0 (gdb) bt
0 0x00007ffff79528c5 in add_host_notification_command_to_contact () from /usr/lib64/naemon/libnaemon.so.0
naemon/naemon#1 0x00007ffff796edf9 in ?? () from /usr/lib64/naemon/libnaemon.so.0 naemon/naemon#2 0x00007ffff79617ad in ?? () from /usr/lib64/naemon/libnaemon.so.0 naemon/naemon#3 0x00007ffff7655544 in g_tree_foreach () from /lib64/libglib-2.0.so.0 naemon/naemon#4 0x00007ffff796f707 in ?? () from /usr/lib64/naemon/libnaemon.so.0 naemon/naemon#5 0x00007ffff797624f in xodtemplate_read_config_data () from /usr/lib64/naemon/libnaemon.so.0 naemon/naemon#6 0x000000000040350f in main ()
(gdb)