Closed ofalk closed 9 years ago
what kind of neb modules have you loaded?
Hi! There is nothing, except gearman. It's a OMD installation.
@ofalk From your description, it sounds like it's reproducible enough. Think you could run it through gdb and get us a backtrace?
A coredump would, of course, be even better - but I guess that'd be hard to justify if you've got sensitive data in your system.
I assume you have livestatus loaded too since you have OMD? Then its probably a known issue with external commands in combination of livestatus with other NEB modules.
this should be fixed now. Livestatus now uses the Queryhandler to submit commands. Please verify if it still fails.
It still dies with SIGSEGV. Just tried with omd-1.01-nc.20140129-rh61-32.i386.rpm
The last few lines (out of strace): [pid 26402] time(NULL) = 1391673218 [pid 26402] gettimeofday({1391673218, 730286}, NULL) = 0 [pid 26402] gettimeofday({1391673218, 730507}, NULL) = 0 [pid 26402] gettimeofday({1391673218, 730702}, NULL) = 0 [pid 26402] gettimeofday({1391673218, 730878}, NULL) = 0 [pid 26402] time(NULL) = 1391673218 [pid 26402] time(NULL) = 1391673218 [pid 26402] --- SIGSEGV (Segmentation fault) @ 0 (0) --- [pid 26402] time(NULL) = 1391673218 [pid 26402] write(5, "[1391673218] Caught SIGSEGV, shutting down...\n", 46) = 46 [pid 26402] gettimeofday({1391673218, 732218}, NULL) = 0 [pid 26402] sigreturn() = ? (mask now []) [pid 26402] --- SIGSEGV (Segmentation fault) @ 0 (0) --- [pid 26402] write(13, "=0.729ms;;;; \n\tHOSTCHECKCOMMAND::check-host-alive!(null)\tHOSTSTATE::0\tHOSTSTATETYPE::1\nDATATYPE::SERVICEPERFDATA\tTIMET::1391673218\tHOSTNAME::XXX\tSERVICEDESC::Updates\tSERVICEPERFDATA::total_updates=0;0;0 security_updates=0;0;0\n\tSERVICECHECKCOMMAND::check_yumupdates\tSERVICESTATE::0\tSERVICESTATETYPE::1\nDATATYPE::SERVICEPERFDATA\tTIMET::1391673218\tHOSTNAME::YYY\tSERVICEDESC::VMFS\tSERVICEPERFDATA::DS_VMFS1=1007552.00MB;; DS_VMFS2=1051093.00MB;; DS_VMFS3=114190.00MB;; DS_VMFS4=1052382.00MB;; DS_VMFS_XXX=413753.00MB;; DS_VMFS_XXX2=189374.00MB;; DS_VMFS5=95775.00MB;; VMFS_backup1=148530.00MB;; DS_VMFS6=103920.00MB;; VMFS_local_esx06=831280.00MB;; DS_VMFS7=602473.00MB;;\n\tSERVICECHECKCOMMAND::check_esx!-D $HOSTADDRESS$ -l vmfs\tSERVICESTATE::0\tSERVICESTATETYPE::1\nDATATYPE::SERVICEPERFDATA\tTIMET::1391673218\tHOSTNAME::XXX\tSERVICEDESC::DskUsg/boot\tSERVICEPERFDATA::usg=79.69;90;95;0; usgABS=118534;133867.8;141304.9;0;\n\tSERVICECHECKCOMMAND::check_snmp_dskusg!/boot!90!95\tSERVICESTATE::0\tSERVICESTATETYPE::1\nDATATYPE::SERVICEPERFDATA\tTIMET::1391673218\tHOSTNAME::ZZZ\tSERVICEDESC::MySQL-tmp-disk-tables\tSERVICEPERFDATA::pct_tmp_table_on_disk=99.82%;25;50 pct_tmp_table_on_disk_now=100.00%\n\tSERVICECHECKCOMMAND::check_mysql_health!--mode tmp-disk-tables\tSERVICESTATE::2\tSERVICESTATETYPE::1\nDATATYPE::SERVICEPERFDATA\tTIMET::1391673218\tHOSTNAME::AAA\tSERVICEDESC::ISL-0\tSERVICEPERFDATA::stat_wtx=4195436;0;0;0;0 stat_wrx=7482972;0;0;0;0 stat_ftx=246781;0;0;0;0 stat_frx=493438;0;0;0;0 er_enc_in=0;0;0;0;0 er_crc=0;0;0;0;0 er_trunc=0;0;0;0;0 er_toolong=0;0;0;0;0 er_bad_eof=0;0;0;0;0 er_enc_out=0;0;0;0;0 er_c3_timeout=0;0;0;0;0\tSERVICECHECKCOMMAND::check_snmp_brocade_fcport!3\tSERVICESTATE::0\tSERVICESTATETYPE::1\nDATATYPE::SERVICEPERFDATA\tTIMET::1391673218\tHOSTNAME::YYY\tSERVICEDESC::Runtime Listhost\tSERVICEPERFDATA::hostcount=3units;;\n\tSERVICECHECKCOMMAND::check_esx!-D $HOSTADDRESS$ -l runtime -s listhost\tSERVICESTATE::0\tSERVICESTATETYPE::1\nDATATYPE::SERVICEPERFDATA\tTIMET::1391673218\tHOSTNAME::ZZZ\tSERVICEDESC::traps\tSERVICEPERFDATA::rta=6.633ms;3000.000;5000.000;0; pl=0%;80;100;; rtmax=19.808ms;;;; rtmin=1.501ms;;;; \n\tSERVICECHECKCOMMAND::check-host-alive\tSERVICESTATE::0\tSERVICESTATETYPE::1\nDATATYPE::SERVICEPERFDATA\tTIMET::1391673218\tHOSTNAME::XXX\tSERVICEDESC::IOS Config\tSERVICEPERFDATA::size=1078B\n\tSERVICECHECKCOMMAND::check_cisco_config\tSERVICESTATE::0\tSERVICESTATETYPE::1\nDATATYPE::SERVICEPERFDATA\tTIMET::1391673218\tHOSTNAME::BBB\tSERVICEDESC::TCP stats\tSERVICEPERFDATA::'TCP stats'=61637c TCP-MIB::tcpPassiveOpens.0=693716c TCP-MIB::tcpInSegs.0=18214605c TCP-MIB::tcpOutSegs.0=20288240c TCP-MIB::tcpRetransSegs.0=138869c \n\tSERVICECHECKCOMMAND::tcp_stats\tSERVICESTATE::0\tSERVICESTATETYPE::1\n", 2737) = 2737
(partially obfuscated).
backtrace: Program received signal SIGSEGV, Segmentation fault. 0x0091b503 in strchr () from /lib/libc.so.6 (gdb) bt
perf_data=0x9eda27c, escape_newlines_please=1, newlines_are_escaped=0) at checks.c:3000
at checks.c:427
Same issue, easily reproducible on Debian Wheezy using labs.consol.de repository.
Versions are: ii gearman-job-server 0.33-2 amd64 Job server for the Gearman distributed job queue ii libgearman7 0.33-1 amd64 Library providing Gearman client and worker functions ii mod-gearman-module 1.4.14 amd64 Event broker module to distribute service checks. ii mod-gearman-tools 1.4.14 amd64 Tools for mod-gearman ii naemon 0.8.1-20140425 amd64 A host/service/network monitoring and management system ii naemon-core 0.8.1-20140425 amd64 contains the Naemon core ii naemon-livestatus 0.8.1-20140425 amd64 contains the Naemon livestatus eventbroker module ii naemon-thruk 0.8.1-20140425 amd64 This package contains the thruk gui for Naemon ii naemon-thruk-libs 0.8.1-20140425 amd64 This package contains the thruk gui for Naemon ii naemon-thruk-reporting 0.8.1-20140425 amd64 This package contains the reporting addon for naemons thruk gui useful for ii naemon-tools 0.8.1-20140425 amd64 contains tools for the Naemon core
It's suposed to be fixed in this version?
I've always interpreted this as a mod_gearman issue, and thus ignored it. But shouldn't we close it, and point to sni/mod_gearman instead?
Hi!
Today I tried the lastest (testing) version of omd-nc (new cores), which includes naemon. Unfortunately, after a minute (or so), naemon ends itself with a SIGSEGV:
[1390996174] Caught SIGSEGV, shutting down...
I guess this will be quite hard to reproduce for you, but I'm totally willing to support you in any way that doesn't (really) compromise my security!
Best, Oliver