Closed riton closed 1 year ago
If you can reproduce the crash, could you create a backtrace with gdb? That would help, since i still cannot reproduce the issue.
Here is what I can get using gdb
:
root@naemon-host ~ $ rpm -qa '*naemon*'
naemon-devel-1.0.8-0.x86_64
naemon-livestatus-1.0.8-0.x86_64
naemon-core-1.0.8-0.x86_64
naemon-thruk-1.0.8-0.noarch
naemon-1.0.8-0.noarch
libnaemon-1.0.8-0.x86_64
naemon-core-dbg-1.0.8-0.x86_64
naemon@naemon-host$ gdb /usr/bin/naemon-dbg
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
[...]
Reading symbols from /usr/bin/naemon-dbg...Reading symbols from /usr/bin/naemon-dbg...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Missing separate debuginfos, use: debuginfo-install naemon-core-dbg-1.0.8-0.x86_64
(gdb) run /etc/naemon/naemon.cfg
Starting program: /usr/bin/naemon-dbg /etc/naemon/naemon.cfg
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[...]
[New Thread 0x7ffff7f79700 (LWP 8311)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7f57700 (LWP 8227)]
0x00007ffff7092faf in __strlen_sse42 () from /lib64/libc.so.6
(gdb) backtrace
#0 0x00007ffff7092faf in __strlen_sse42 () from /lib64/libc.so.6
#1 0x00007ffff5ac8f59 in Query::outputString(char const*) () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#2 0x00007ffff5ac8c33 in Query::printRow(void*) () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#3 0x00007ffff5acc602 in Query::finish() () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#4 0x00007ffff5acf407 in Store::answerGetRequest(InputBuffer*, OutputBuffer*, char const*) () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#5 0x00007ffff5acf72b in Store::answerRequest(InputBuffer*, OutputBuffer*) () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#6 0x00007ffff5aceab9 in store_answer_request () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#7 0x00007ffff5b06e93 in client_thread () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#8 0x00007ffff6adfe25 in start_thread () from /lib64/libpthread.so.0
#9 0x00007ffff704e34d in clone () from /lib64/libc.so.6
Here is a trace I got when two simultaneous queries were made to Livestatus (the query is the one I detailed in my previous post).
Another one which is slightly different
[...]
[New Thread 0x7ffff7f57700 (LWP 13133)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7f68700 (LWP 13112)]
0x00007ffff5ac5858 in OffsetStringColumn::getValue(void*) () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
(gdb) backtrace
#0 0x00007ffff5ac5858 in OffsetStringColumn::getValue(void*) () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#1 0x00007ffff5ad0b03 in StringColumn::compare(void*, void*, Query*) () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#2 0x00007ffff5b0a172 in RowSortedSet::compare(void*, void*) () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#3 0x00007ffff5b0a273 in RowSortedSet::extract() () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#4 0x00007ffff5b0a3c7 in RowSortedSet::insert(void*, int) () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#5 0x00007ffff5acc228 in Query::processDataset(void*) () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#6 0x00007ffff5afd745 in Logfile::answerQueryReverse(Query*, TableLog*, long, long, unsigned int) () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#7 0x00007ffff5b02308 in TableLog::answerQuery(Query*) () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#8 0x00007ffff5acf3ff in Store::answerGetRequest(InputBuffer*, OutputBuffer*, char const*) () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#9 0x00007ffff5acf72b in Store::answerRequest(InputBuffer*, OutputBuffer*) () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#10 0x00007ffff5aceab9 in store_answer_request () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#11 0x00007ffff5b06e93 in client_thread () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#12 0x00007ffff6adfe25 in start_thread () from /lib64/libpthread.so.0
#13 0x00007ffff704e34d in clone () from /lib64/libc.so.6
The exact same requests were sent to Livestatus to trigger the SIGSEGV
.
[Thinking this is the same as my issue] I am also getting crashes when looking back over periods of time for events. Naemon: ERROR: failed to connect - failed to connect to /var/cache/naemon/live: Connection refused.. (/var/cache/naemon/live)
In reference to my above, can someone tell me where the debug commands are for Ubuntu 16.04, and what I need to gather a not collected trace/debugs out of Naemon before. This is a critical system for our company so keen to get it fixed.
Should be better soon, i removed the sort header from log livestatus queries in Thruk: https://github.com/sni/Thruk/commit/d9d98ca1919b3178d4d292b06e389134817708f6
i will keep this issue open, since the real cause has not been solved yet.
This #29 fixed the same problem I was having here #28 with event log.
As far as I remembered debugging this, the LogEntry array gets corrupted after this function is called a second time. I did not know it would disable cache, but it is a work around for the segfault that stops naemon.
i think this has been fixed with #103 please let me know if it still happens.
We're experiencing crash with the latest version of naemon and thruk when using the
/thruk/cgi-bin/showlog.cgi
CGI by clicking on theEvent Log
button in Thruk interface.Our configuration holds
After investigations, the crash is caused by this LiveStatus query:
with a log files that has the following number of lines
When we issue the Livestatus query above, Livestatus outputs messages like
and very often (every time ?) naemon dies.
After applying the patch specified in #29 , it seems the
g_tree_foreach: assertion 'tree != NULL' failed
still appears but this time, naemon seem to stay alive. We can see theg_tree_foreach: assertion 'tree != NULL' failed
messages in Thruk interface like shown in the following screenshot:This issue is related to #28 but the proposed patch only fixes the naemon crash problem (which is a good thing :+1: ) but the problem is only mitigated here since the
g_tree_foreach: assertion 'tree != NULL' failed
always shows up.Maybe that this issue is also related to #39 (cause the Query also contains a
Sort
filter), maybe it's not.