Closed AndrewGuenther closed 6 years ago
Ran a debug build and got more details. Looks like there is an assert failure hit right before the crash. I'm going to guess that the value being passed to m_hostInfos.erase
is actually m_hostInfos.end()
which will then cause the segfault. I don't know why that node wouldn't be found, but a single check should determine if this is the case as well as mitigate for the time being.
ASSERT failure in QVector::erase: "The specified iterator argument 'aend' is invalid", file /usr/include/x86_64-linux-gnu/qt5/QtCore/qvector.h, line 677
Thread 1 "icemon" received signal SIGABRT, Aborted.
0x00007ffff5eeb428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
#0 0x00007ffff5eeb428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007ffff5eed02a in __GI_abort () at abort.c:89
#2 0x00007ffff6bbaf81 in () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#3 0x00007ffff6bb6151 in () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#4 0x00000000004615c1 in QVector<HostInfo>::erase(QTypedArrayData<HostInfo>::iterator, QTypedArrayData<HostInfo>::iterator) (this=0xc84e80, abegin=..., aend=...) at /usr/include/x86_64-linux-gnu/qt5/QtCore/qvector.h:677
#5 0x0000000000460b28 in QVector<HostInfo>::erase(QTypedArrayData<HostInfo>::iterator) (this=0xc84e80, pos=...) at /usr/include/x86_64-linux-gnu/qt5/QtCore/qvector.h:200
#6 0x000000000045ff20 in HostListModel::removeNodeById(unsigned int) (this=0xc84e60, hostId=3859) at /home/andrew/icemon/src/models/hostlistmodel.cc:227
#7 0x00000000004842e6 in HostListModel::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) (_o=0xc84e60, _c=QMetaObject::InvokeMetaMethod, _id=1, _a=0x7fffffffccf0) at /home/andrew/icemon/build/src/moc_hostlistmodel.cpp:76
#8 0x00007ffff6ddcd2a in QMetaObject::activate(QObject*, int, int, void**) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#9 0x0000000000484b81 in Monitor::nodeRemoved(unsigned int) (this=0xb2fa20, _t1=3859) at /home/andrew/icemon/build/src/moc_monitor.cpp:231
#10 0x000000000045170e in IcecreamMonitor::handle_stats(Msg*) (this=0xb2fa20, _m=0xce20b0) at /home/andrew/icemon/src/icecreammonitor.cc:294
#11 0x0000000000450d55 in IcecreamMonitor::handle_activity() (this=0xb2fa20) at /home/andrew/icemon/src/icecreammonitor.cc:204
#12 0x0000000000450c21 in IcecreamMonitor::msgReceived() (this=0xb2fa20) at /home/andrew/icemon/src/icecreammonitor.cc:175
#13 0x0000000000483f54 in IcecreamMonitor::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) (_o=0xb2fa20, _c=QMetaObject::InvokeMetaMethod, _id=1, _a=0x7fffffffcf80) at /home/andrew/icemon/build/src/moc_icecreammonitor.cpp:74
#14 0x00007ffff6ddcd2a in QMetaObject::activate(QObject*, int, int, void**) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#15 0x00007ffff6e5c24e in QSocketNotifier::activated(int, QSocketNotifier::QPrivateSignal) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#16 0x00007ffff6de91cb in QSocketNotifier::event(QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#17 0x00007ffff78a505c in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#18 0x00007ffff78aa516 in QApplication::notify(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#19 0x00007ffff6dae38b in QCoreApplication::notifyInternal(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#20 0x00007ffff6e04c95 in () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#21 0x00007ffff577f197 in g_main_context_dispatch (context=0x7fffe40016f0) at /build/glib2.0-prJhLS/glib2.0-2.48.2/./glib/gmain.c:3154
#22 0x00007ffff577f197 in g_main_context_dispatch (context=context@entry=0x7fffe40016f0) at /build/glib2.0-prJhLS/glib2.0-2.48.2/./glib/gmain.c:3769
#23 0x00007ffff577f3f0 in g_main_context_iterate (context=context@entry=0x7fffe40016f0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at /build/glib2.0-prJhLS/glib2.0-2.48.2/./glib/gmain.c:3840
#24 0x00007ffff577f49c in g_main_context_iteration (context=0x7fffe40016f0, may_block=1) at /build/glib2.0-prJhLS/glib2.0-2.48.2/./glib/gmain.c:3901
#25 0x00007ffff6e047cf in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#26 0x00007ffff6dabb4a in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#27 0x00007ffff6db3bec in QCoreApplication::exec() () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#28 0x0000000000455149 in main(int, char**) (argc=3, argv=0x7fffffffd728) at /home/andrew/icemon/src/main.cc:71
Confirmed that the following patch will mitigate the issue.
diff --git a/src/models/hostlistmodel.cc b/src/models/hostlistmodel.cc
index cd40535..4a7c4a7 100644
--- a/src/models/hostlistmodel.cc
+++ b/src/models/hostlistmodel.cc
@@ -222,6 +222,9 @@ private:
void HostListModel::removeNodeById(unsigned int hostId)
{
QVector<HostInfo>::iterator it = std::find_if(m_hostInfos.begin(), m_hostInfos.end(), find_hostid(hostId));
+ if (it == m_hostInfos.end()) {
+ return;
+ }
int index = std::distance(m_hostInfos.begin(), it);
beginRemoveRows(QModelIndex(), index, index);
m_hostInfos.erase(it);
I'll spend a little more time trying to find out why the hostId
passed into removeNodeById
isn't found, but that is probably something which will be easier for one of the maintainers. Regardless, I'll submit a PR for the patch above.
Included some additional logging. It looks like 99% of the time removeNodeById is called, it receives an invalid id.
Removing host: 4166
Host not found.
Removing host: 4167
Host not found.
Removing host: 4168
Host not found.
Removing host: 4169
Host not found.
Removing host: 4170
Host not found.
Removing host: 4171
Host not found.
Removing host: 4172
Host not found.
Removing host: 4173
Host not found.
Removing host: 4174
Host not found.
Removing host: 3995
Removing host: 4175
Host not found.
Removing host: 4176
Host not found.
Removing host: 4177
Host not found.
Removing host: 4179
Host not found.
Removing host: 4118
Removing host: 4119
Removing host: 4180
Host not found.
Removing host: 4181
Host not found.
Removing host: 4182
Host not found.
Removing host: 4185
Host not found.
Removing host: 4186
Host not found.
Removing host: 4187
Host not found.
I added those logs with the following diff:
diff --git a/src/models/hostlistmodel.cc b/src/models/hostlistmodel.cc
index cd40535..f6e9f17 100644
--- a/src/models/hostlistmodel.cc
+++ b/src/models/hostlistmodel.cc
@@ -26,6 +26,7 @@
#include <QPalette>
#include <algorithm>
+#include <iostream>
HostListModel::HostListModel(QObject *parent)
: QAbstractListModel(parent)
@@ -191,6 +192,7 @@ void HostListModel::checkNode(unsigned int hostid)
const int index = m_hostInfos.indexOf(*info);
if (index != -1) {
if (info->isOffline()) {
+ std::cerr << "Removing offline host.\n";
removeNodeById(hostid);
} else {
m_hostInfos[index] = *info;
@@ -222,6 +224,11 @@ private:
void HostListModel::removeNodeById(unsigned int hostId)
{
QVector<HostInfo>::iterator it = std::find_if(m_hostInfos.begin(), m_hostInfos.end(), find_hostid(hostId));
+ std::cerr << "Removing host: " << hostId << "\n";
+ if (it == m_hostInfos.end()) {
+ std::cerr << "Host not found.\n";
+ return;
+ }
int index = std::distance(m_hostInfos.begin(), it);
beginRemoveRows(QModel
I'm not going to look any deeper into this because from what I can tell, the icemon client is assuming some state about what it should be receiving from the scheduler. The scheduler is sending data which triggers icemon to remove a hostId
that it knows nothing about.
"Default Host View" will eventually lead to the following segfault:
I've verified this on both a Mac and an Ubuntu machine using the icemon 3.1.0 release as well as when compiled from latest source.