naemon / naemon-livestatus

Naemon - Livestatus Eventbroker Module
GNU General Public License v2.0
26 stars 30 forks source link

Naemon sigsegv when using thruk logcache #54

Closed snarf6 closed 5 years ago

snarf6 commented 5 years ago

Hello,

I have a quite strange behavior with one of our naemon.

naemon daemon segfault when logcache update runs.

here the gdb backtrace :

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7e48700 (LWP 13796)]
strlen () at ../sysdeps/x86_64/strlen.S:106
106 ../sysdeps/x86_64/strlen.S: No such file or directory.
(gdb) bt
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
naemon/naemon#1  0x00007ffff67b9e6c in LogEntry::serviceStateToInt(char*) () from /usr/lib/naemon/naemon-livestatus/livestatus.so
naemon/naemon#2  0x00007ffff67ba33f in LogEntry::handleNotificationEntry() () from /usr/lib/naemon/naemon-livestatus/livestatus.so
naemon/naemon#3  0x00007ffff67ba549 in LogEntry::LogEntry(unsigned int, char*) () from /usr/lib/naemon/naemon-livestatus/livestatus.so
naemon/naemon#4  0x00007ffff67ba8a4 in Logfile::processLogLine(unsigned int, unsigned int) () from /usr/lib/naemon/naemon-livestatus/livestatus.so
naemon/naemon#5  0x00007ffff67ba9e5 in Logfile::loadRange(_IO_FILE*, unsigned int, TableLog*, long, long, unsigned int) () from /usr/lib/naemon/naemon-livestatus/livestatus.so
naemon/naemon#6  0x00007ffff67baae0 in Logfile::load(TableLog*, long, long, unsigned int) () from /usr/lib/naemon/naemon-livestatus/livestatus.so
naemon/naemon#7  0x00007ffff67bad17 in Logfile::answerQueryReverse(Query*, TableLog*, long, long, unsigned int) () from /usr/lib/naemon/naemon-livestatus/livestatus.so
naemon/naemon#8  0x00007ffff67bf444 in TableLog::answerQuery(Query*) () from /usr/lib/naemon/naemon-livestatus/livestatus.so
naemon/naemon#9  0x00007ffff678b6af in Store::answerGetRequest(InputBuffer*, OutputBuffer*, char const*) () from /usr/lib/naemon/naemon-livestatus/livestatus.so
naemon/naemon#10 0x00007ffff678b9fb in Store::answerRequest(InputBuffer*, OutputBuffer*) () from /usr/lib/naemon/naemon-livestatus/livestatus.so
naemon/naemon#11 0x00007ffff678ab29 in store_answer_request () from /usr/lib/naemon/naemon-livestatus/livestatus.so
naemon/naemon#12 0x00007ffff67c4073 in client_thread () from /usr/lib/naemon/naemon-livestatus/livestatus.so
naemon/naemon#13 0x00007ffff7bc7064 in start_thread (arg=0x7ffff7e48700) at pthread_create.c:309
naemon/naemon#14 0x00007ffff6e3e62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

the issue seems to happens here https://github.com/naemon/naemon-livestatus/blob/8e2721abbbfe5234590770faee630c64becb4f21/src/LogEntry.cc#L276

I've tried various version of naemon from stable and testing and i've the same behavior.

a second naemon, with same versions but not se same generated config, does not crash.

I suspect something strange between the generated config (by Nconf), thruk, livestatus and naemon but i don't really know how to investigate more.

snarf6 commented 5 years ago

I've maybe found something interesting. This instance was made in a hurry from an existing naemon instance (the one which works well) by cloning the VM.

Configs where purged and redeployed by puppet (for system and globla config) and Nconf (for the objects config).

Some old logs were remaining in the archive directory, and the process don't segfault anymore after removing old logs.

I may have triggered something unforseen case when logcache extract log from a server having log belonging to another