naemon / naemon-livestatus

Naemon - Livestatus Eventbroker Module
GNU General Public License v2.0
27 stars 30 forks source link

naemon-livestatus crashes on CentOS 8 if "Sort: name asc" is used #70

Closed pbiering closed 4 years ago

pbiering commented 4 years ago

On CentOS 8 with packages installed from Consol Stable Repo:

rpm -qa | egrep '(naemon|thruk)' | sort
libnaemon-1.2.0-0.x86_64
libthruk-2.34-0.x86_64
naemon-1.2.0-0.noarch
naemon-core-1.2.0-0.x86_64
naemon-livestatus-1.2.0-0.x86_64
naemon-thruk-1.2.0-0.noarch
thruk-2.34-2.x86_64
thruk-base-2.34-2.x86_64
thruk-plugin-reporting-2.34-2.x86_64

thruk leads naemon to crash, manually reduced the query, issue is on "Sort: name asc":

echo -e "GET hosts\nColumns: address alias\nLimit: 100\nSort: name
asc\nOutputFormat: wrapped_json\nResponseHeader: fixed16" | unixcat
/var/cache/naemon/live

Foreground execution shows as last line before crashing:

/usr/include/c++/8/bits/stl_vector.h:932: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = void*; _Alloc = std::allocator<void*>; std::vector<_Tp, _Alloc>::reference = void*&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__builtin_expect(__n < this->size(), true)' failed.

On CentOS 7 same query behaves fine

pbiering commented 4 years ago

Stack traces extracted: - Process 3608278 (naemon) of user 977 dumped core.

Stack trace of thread 3611195:
#0  0x00007f34f7d737df raise (libc.so.6)
#1  0x00007f34f7d5dc15 abort (libc.so.6)
#2  0x00007f34f5ea0ae8 _ZSt20__replacement_assertPKciS0_S0_ (livestatus.so)
#3  0x00007f34f5edd6a1 _ZN12RowSortedSet7extractEv (livestatus.so)
#4  0x00007f34f5ea0885 _ZN5Query6finishEv (livestatus.so)
#5  0x00007f34f5ea2ee5 _ZN5Store16answerGetRequestEP11InputBufferP12OutputBufferPKc (livestatus.so)
#6  0x00007f34f5ea31e3 _ZN5Store13answerRequestEP11InputBufferP12OutputBuffer (livestatus.so)
#7  0x00007f34f5ea258d store_answer_request (livestatus.so)
#8  0x00007f34f5ed9267 client_thread (livestatus.so)
#9  0x00007f34f74f72de start_thread (libpthread.so.0)
#10 0x00007f34f7e382a3 __clone (libc.so.6)

Stack trace of thread 3608278:
#0  0x00007f34f7e385d7 epoll_wait (libc.so.6)
#1  0x00007f34f8a0b525 iobroker_poll (libnaemon.so.0)
#2  0x00007f34f89ce265 event_poll (libnaemon.so.0)
#3  0x0000558a27d17dc5 main (naemon)
#4  0x00007f34f7d5f793 __libc_start_main (libc.so.6)
#5  0x0000558a27d183ee _start (naemon)
pbiering commented 4 years ago

little bit more readable from gdb backtrace:

[Current thread is 1 (Thread 0x7ff9e6c7c700 (LWP 3632032))]

Missing separate debuginfos, use: yum debuginfo-install naemon-core-1.2.0-0.x86_64

(gdb) bt
#0  0x00007ff9e5b947df in raise () from /lib64/libc.so.6
#1  0x00007ff9e5b7ec15 in abort () from /lib64/libc.so.6
#2  0x00007ff9e3cc1ae8 in std::__replacement_assert(char const*, int, char const*, char const*) () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#3  0x00007ff9e3cfe6a1 in RowSortedSet::extract() () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#4  0x00007ff9e3cc1885 in Query::finish() () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#5  0x00007ff9e3cc3ee5 in Store::answerGetRequest(InputBuffer*, OutputBuffer*, char const*) () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#6  0x00007ff9e3cc41e3 in Store::answerRequest(InputBuffer*, OutputBuffer*) () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#7  0x00007ff9e3cc358d in store_answer_request () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#8  0x00007ff9e3cfa267 in client_thread () from /usr/lib64/naemon/naemon-livestatus/livestatus.so
#9  0x00007ff9e53182de in start_thread () from /lib64/libpthread.so.0
#10 0x00007ff9e5c592a3 in clone () from /lib64/libc.so.6
pbiering commented 4 years ago

recompiled naemon-livestatus, many warnings are occuring like:

Filter.h:48:12: warning: 'Filter::_query' will be initialized after [-Wreorder]
Query *_query; // needed by TimeOffsetFilter (currently)

gdb shows now more details on core dump backtrace:

#0  0x00007f186f3697df in raise () from /lib64/libc.so.6
#1  0x00007f186f353c15 in abort () from /lib64/libc.so.6
#2  0x00007f186d497038 in std::__replacement_assert (__file=__file@entry=0x7f186d4d5a20 "/usr/include/c++/8/bits/stl_vector.h", __line=__line@entry=932,
    __function=__function@entry=0x7f186d4de2a0 <std::vector<void*, std::allocator<void*> >::operator[](unsigned long)::__PRETTY_FUNCTION__> "std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = void*; _Alloc = std::allocator<void*>; std::vector<_Tp, _Alloc>::reference = v"..., __condition=__condition@entry=0x7f186d4de268 "__builtin_expect(__n < this->size(), true)") at /usr/include/c++/8/x86_64-redhat-linux/bits/c++config.h:2391
#3  0x00007f186d4d3ca1 in std::vector<void*, std::allocator<void*> >::operator[] (__n=<optimized out>, this=0x7f1870450b88) at /usr/include/c++/8/bits/stl_vector.h:930
#4  RowSortedSet::extract (this=this@entry=0x7f1870450b88) at RowSortedSet.cc:101
#5  0x00007f186d496dd5 in Query::finish (this=this@entry=0x7f1870450960) at Query.cc:1105
#6  0x00007f186d499435 in Store::answerGetRequest (this=<optimized out>, input=0x7f1864000b60, output=0x7f1864010e40, tablename=0x7f1870450c64 "hosts") at Store.cc:199
#7  0x00007f186d499733 in Store::answerRequest (this=0x5637bccdfd50, input=input@entry=0x7f1864000b60, output=output@entry=0x7f1864010e40) at Store.cc:133
#8  0x00007f186d498add in store_answer_request (ib=ib@entry=0x7f1864000b60, ob=ob@entry=0x7f1864010e40) at store.cc:80
#9  0x00007f186d4cf867 in client_thread (data=<optimized out>) at module.c:200
#10 0x00007f186eaed2de in start_thread () from /lib64/libpthread.so.0
#11 0x00007f186f42e2a3 in clone () from /lib64/libc.so.6
pbiering commented 4 years ago

issue replicated on a different CentOS 8 system, simply add repo to /etc/yum.repos.d/ by

wget https://download.opensuse.org/repositories/home:/naemon/CentOS_8_Stream/home:naemon.repo

start naemon and executed query from above via livestatus.

Debugged already a little bit in the sources, looks like RowSortedSet::extract() has issues in case only one element left to be sent-out.

pbiering commented 4 years ago

reproduced still with all fixed compiler warnings (PR https://github.com/naemon/naemon-livestatus/pull/71) - but found that in case compiled with "clang", the issue disappears...looks like code is not that clean as it should be...will try to toggle some on EL8 used gcc-c++ optimization options, potentially they influence the issue.

pbiering commented 4 years ago

fixed by https://github.com/naemon/naemon-livestatus/pull/73 (split-out)