icecc / icemon

Icecream GUI Monitor
http://kfunk.org/tag/icemon/
GNU General Public License v2.0
91 stars 35 forks source link

segfault on Detailed Host View #37

Closed AndrewGuenther closed 6 years ago

AndrewGuenther commented 6 years ago

"Default Host View" will eventually lead to the following segfault:

#0  0x0000000000447b72 in QBasicAtomicInteger<int>::load() const ()
#1  0x00000000004472f2 in QtPrivate::RefCount::deref() ()
#2  0x00000000004474a9 in QString::~QString() ()
#3  0x0000000000447a10 in HostInfo::~HostInfo() ()
#4  0x000000000045fad7 in QVector<HostInfo>::erase(QTypedArrayData<HostInfo>::iterator, QTypedArrayData<HostInfo>::iterator) ()
#5  0x000000000045ef6c in QVector<HostInfo>::erase(QTypedArrayData<HostInfo>::iterator) ()
#6  0x000000000045e3ca in HostListModel::removeNodeById(unsigned int) ()
#7  0x00000000004809c6 in HostListModel::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) ()
#8  0x00007ffff6ddcd2a in QMetaObject::activate(QObject*, int, int, void**) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#9  0x0000000000481261 in Monitor::nodeRemoved(unsigned int) ()
#10 0x0000000000450442 in IcecreamMonitor::handle_stats(Msg*) ()
#11 0x000000000044fa89 in IcecreamMonitor::handle_activity() ()
#12 0x000000000044f955 in IcecreamMonitor::msgReceived() ()
#13 0x0000000000480634 in IcecreamMonitor::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) ()
#14 0x00007ffff6ddcd2a in QMetaObject::activate(QObject*, int, int, void**) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#15 0x00007ffff6e5c24e in QSocketNotifier::activated(int, QSocketNotifier::QPrivateSignal) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#16 0x00007ffff6de91cb in QSocketNotifier::event(QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#17 0x00007ffff78a505c in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#18 0x00007ffff78aa516 in QApplication::notify(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#19 0x00007ffff6dae38b in QCoreApplication::notifyInternal(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#20 0x00007ffff6e04c95 in  () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#21 0x00007ffff577f197 in g_main_context_dispatch (context=0x7fffe40016f0) at /build/glib2.0-prJhLS/glib2.0-2.48.2/./glib/gmain.c:3154
#22 0x00007ffff577f197 in g_main_context_dispatch (context=context@entry=0x7fffe40016f0) at /build/glib2.0-prJhLS/glib2.0-2.48.2/./glib/gmain.c:3769
#23 0x00007ffff577f3f0 in g_main_context_iterate (context=context@entry=0x7fffe40016f0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at /build/glib2.0-prJhLS/glib2.0-2.48.2/./glib/gmain.c:3840
#24 0x00007ffff577f49c in g_main_context_iteration (context=0x7fffe40016f0, may_block=1) at /build/glib2.0-prJhLS/glib2.0-2.48.2/./glib/gmain.c:3901
#25 0x00007ffff6e047cf in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#26 0x00007ffff6dabb4a in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#27 0x00007ffff6db3bec in QCoreApplication::exec() () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#28 0x0000000000453cc5 in main ()

I've verified this on both a Mac and an Ubuntu machine using the icemon 3.1.0 release as well as when compiled from latest source.

AndrewGuenther commented 6 years ago

Ran a debug build and got more details. Looks like there is an assert failure hit right before the crash. I'm going to guess that the value being passed to m_hostInfos.erase is actually m_hostInfos.end() which will then cause the segfault. I don't know why that node wouldn't be found, but a single check should determine if this is the case as well as mitigate for the time being.

ASSERT failure in QVector::erase: "The specified iterator argument 'aend' is invalid", file /usr/include/x86_64-linux-gnu/qt5/QtCore/qvector.h, line 677

Thread 1 "icemon" received signal SIGABRT, Aborted.
0x00007ffff5eeb428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
#0  0x00007ffff5eeb428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff5eed02a in __GI_abort () at abort.c:89
#2  0x00007ffff6bbaf81 in  () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#3  0x00007ffff6bb6151 in  () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#4  0x00000000004615c1 in QVector<HostInfo>::erase(QTypedArrayData<HostInfo>::iterator, QTypedArrayData<HostInfo>::iterator) (this=0xc84e80, abegin=..., aend=...) at /usr/include/x86_64-linux-gnu/qt5/QtCore/qvector.h:677
#5  0x0000000000460b28 in QVector<HostInfo>::erase(QTypedArrayData<HostInfo>::iterator) (this=0xc84e80, pos=...) at /usr/include/x86_64-linux-gnu/qt5/QtCore/qvector.h:200
#6  0x000000000045ff20 in HostListModel::removeNodeById(unsigned int) (this=0xc84e60, hostId=3859) at /home/andrew/icemon/src/models/hostlistmodel.cc:227
#7  0x00000000004842e6 in HostListModel::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) (_o=0xc84e60, _c=QMetaObject::InvokeMetaMethod, _id=1, _a=0x7fffffffccf0) at /home/andrew/icemon/build/src/moc_hostlistmodel.cpp:76
#8  0x00007ffff6ddcd2a in QMetaObject::activate(QObject*, int, int, void**) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#9  0x0000000000484b81 in Monitor::nodeRemoved(unsigned int) (this=0xb2fa20, _t1=3859) at /home/andrew/icemon/build/src/moc_monitor.cpp:231
#10 0x000000000045170e in IcecreamMonitor::handle_stats(Msg*) (this=0xb2fa20, _m=0xce20b0) at /home/andrew/icemon/src/icecreammonitor.cc:294
#11 0x0000000000450d55 in IcecreamMonitor::handle_activity() (this=0xb2fa20) at /home/andrew/icemon/src/icecreammonitor.cc:204
#12 0x0000000000450c21 in IcecreamMonitor::msgReceived() (this=0xb2fa20) at /home/andrew/icemon/src/icecreammonitor.cc:175
#13 0x0000000000483f54 in IcecreamMonitor::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) (_o=0xb2fa20, _c=QMetaObject::InvokeMetaMethod, _id=1, _a=0x7fffffffcf80) at /home/andrew/icemon/build/src/moc_icecreammonitor.cpp:74
#14 0x00007ffff6ddcd2a in QMetaObject::activate(QObject*, int, int, void**) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#15 0x00007ffff6e5c24e in QSocketNotifier::activated(int, QSocketNotifier::QPrivateSignal) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#16 0x00007ffff6de91cb in QSocketNotifier::event(QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#17 0x00007ffff78a505c in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#18 0x00007ffff78aa516 in QApplication::notify(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#19 0x00007ffff6dae38b in QCoreApplication::notifyInternal(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#20 0x00007ffff6e04c95 in  () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#21 0x00007ffff577f197 in g_main_context_dispatch (context=0x7fffe40016f0) at /build/glib2.0-prJhLS/glib2.0-2.48.2/./glib/gmain.c:3154
#22 0x00007ffff577f197 in g_main_context_dispatch (context=context@entry=0x7fffe40016f0) at /build/glib2.0-prJhLS/glib2.0-2.48.2/./glib/gmain.c:3769
#23 0x00007ffff577f3f0 in g_main_context_iterate (context=context@entry=0x7fffe40016f0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at /build/glib2.0-prJhLS/glib2.0-2.48.2/./glib/gmain.c:3840
#24 0x00007ffff577f49c in g_main_context_iteration (context=0x7fffe40016f0, may_block=1) at /build/glib2.0-prJhLS/glib2.0-2.48.2/./glib/gmain.c:3901
#25 0x00007ffff6e047cf in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#26 0x00007ffff6dabb4a in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#27 0x00007ffff6db3bec in QCoreApplication::exec() () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#28 0x0000000000455149 in main(int, char**) (argc=3, argv=0x7fffffffd728) at /home/andrew/icemon/src/main.cc:71
AndrewGuenther commented 6 years ago

Confirmed that the following patch will mitigate the issue.

diff --git a/src/models/hostlistmodel.cc b/src/models/hostlistmodel.cc
index cd40535..4a7c4a7 100644
--- a/src/models/hostlistmodel.cc
+++ b/src/models/hostlistmodel.cc
@@ -222,6 +222,9 @@ private:
 void HostListModel::removeNodeById(unsigned int hostId)
 {
     QVector<HostInfo>::iterator it = std::find_if(m_hostInfos.begin(), m_hostInfos.end(), find_hostid(hostId));
+    if (it == m_hostInfos.end()) {
+      return;
+    }
     int index = std::distance(m_hostInfos.begin(), it);
     beginRemoveRows(QModelIndex(), index, index);
     m_hostInfos.erase(it);

I'll spend a little more time trying to find out why the hostId passed into removeNodeById isn't found, but that is probably something which will be easier for one of the maintainers. Regardless, I'll submit a PR for the patch above.

AndrewGuenther commented 6 years ago

Included some additional logging. It looks like 99% of the time removeNodeById is called, it receives an invalid id.

Removing host: 4166
Host not found.
Removing host: 4167
Host not found.
Removing host: 4168
Host not found.
Removing host: 4169
Host not found.
Removing host: 4170
Host not found.
Removing host: 4171
Host not found.
Removing host: 4172
Host not found.
Removing host: 4173
Host not found.
Removing host: 4174
Host not found.
Removing host: 3995
Removing host: 4175
Host not found.
Removing host: 4176
Host not found.
Removing host: 4177
Host not found.
Removing host: 4179
Host not found.
Removing host: 4118
Removing host: 4119
Removing host: 4180
Host not found.
Removing host: 4181
Host not found.
Removing host: 4182
Host not found.
Removing host: 4185
Host not found.
Removing host: 4186
Host not found.
Removing host: 4187
Host not found.

I added those logs with the following diff:

diff --git a/src/models/hostlistmodel.cc b/src/models/hostlistmodel.cc
index cd40535..f6e9f17 100644
--- a/src/models/hostlistmodel.cc
+++ b/src/models/hostlistmodel.cc
@@ -26,6 +26,7 @@
 #include <QPalette>

 #include <algorithm>
+#include <iostream>

 HostListModel::HostListModel(QObject *parent)
     : QAbstractListModel(parent)
@@ -191,6 +192,7 @@ void HostListModel::checkNode(unsigned int hostid)
     const int index = m_hostInfos.indexOf(*info);
     if (index != -1) {
         if (info->isOffline()) {
+            std::cerr << "Removing offline host.\n";
             removeNodeById(hostid);
         } else {
             m_hostInfos[index] = *info;
@@ -222,6 +224,11 @@ private:
 void HostListModel::removeNodeById(unsigned int hostId)
 {
     QVector<HostInfo>::iterator it = std::find_if(m_hostInfos.begin(), m_hostInfos.end(), find_hostid(hostId));
+    std::cerr << "Removing host: " << hostId << "\n";
+    if (it == m_hostInfos.end()) {
+      std::cerr << "Host not found.\n";
+      return;
+    }
     int index = std::distance(m_hostInfos.begin(), it);
     beginRemoveRows(QModel

I'm not going to look any deeper into this because from what I can tell, the icemon client is assuming some state about what it should be receiving from the scheduler. The scheduler is sending data which triggers icemon to remove a hostId that it knows nothing about.