ros-visualization / qt_gui_core

BSD 3-Clause "New" or "Revised" License
37 stars 77 forks source link

Segmentation fault on closure of rqt with cpp plugin #258

Open MatthijsBurgh opened 2 years ago

MatthijsBurgh commented 2 years ago

Running on noetic

When closing rqt, while a cpp plugin is still active, I get a segmentation fault or the following aborted.

terminate called after throwing an instance of 'boost::wrapexcept<boost::lock_error>'
  what():  boost: mutex lock failed in pthread_mutex_lock: Invalid argument
Aborted (core dumped)

The problem only occurs when the plugin is still active on closure. When you close the plugin manually before closing rqt, the problem doesn't occur.

I tried some more debugging on rqt_image_view. (I also experience the issue with mapviz, but I didn't do any debugging on that plugin.)

I figured out that the destructor is not being called when closing rqt with the plugin active. Or at least not being called before the segmentation fault occurs.

Debugging

As I wasn't able to get gdb working to debug the plugin as rqt runs in python. I added some print statements to debug what happens. I just drop it here as it might help you to solve the issue.

When closing the cpp plugin(rqt_image_view) manually:

ImageView::callbackImage done
ImageView::saveSettings
ImageView::saveSettings done
PluginBridge::shutdown_plugin
PluginBridge::shutdown_plugin plugin_->removeEventFilter(this)
PluginBridge::shutdown_plugin plugin_->shutdownPlugin()
ImageView::shutdown
ImageView::shutdown done
PluginBridge::shutdown_plugin done
PluginBridge::unload_plugin
PluginProvider::unload_plugin
CompositePluginProvider::unload
CompositePluginProvider::unload
NodeletPluginProvider::unload
Unloading: rqt_image_view/ImageView_1
NodeletPluginProvider::shutdown
NodeletPluginProvider::RosSpinThread::run done
NodeletPluginProvider::shutdown done
NodeletPluginProvider::unload done
CompositePluginProvider::unload return
CompositePluginProvider::unload return
PluginProvider::unload_plugin done
PluginBridge::unload_plugin done
PluginBridge::~PluginBridge
PluginBridge::~PluginBridge done
NodeletPluginProvider::RosSpinThread::~RosSpinThread
NodeletPluginProvider::RosSpinThread::~RosSpinThread done
ImageView::destructor
ImageView::destructor done

When closing rqt with an active cpp plugin(rqt_image_view):

ImageView::callbackImage done
ImageView::saveSettings
ImageView::saveSettings done
PluginBridge::shutdown_plugin
PluginBridge::shutdown_plugin plugin_->removeEventFilter(this)
PluginBridge::shutdown_plugin plugin_->shutdownPlugin()
ImageView::shutdown
ImageView::shutdown done
PluginBridge::shutdown_plugin done
PluginProvider::shutdown
PluginProvider::shutdown done
CompositePluginProvider::shutdown
CompositePluginProvider::shutdown
NodeletPluginProvider::shutdown
NodeletPluginProvider::RosSpinThread::run done
NodeletPluginProvider::shutdown done
CompositePluginProvider::shutdown done
CompositePluginProvider::shutdown done
NodeletPluginProvider::RosSpinThread::~RosSpinThread
NodeletPluginProvider::RosSpinThread::~RosSpinThread done
PluginBridge::~PluginBridge
PluginBridge::~PluginBridge done
RosPluginlibPluginProvider::~RosPluginlibPluginProvider
RosPluginlibPluginProvider::~RosPluginlibPluginProvider done
PluginProvider::~PluginProvider
PluginProvider::~PluginProvider done
Segmentation fault (core dumped)
ivanpauno commented 2 years ago

As I wasn't able to get gdb working to debug the plugin as rqt runs in python.

Use gdb --args ${python_executable} ${path_to_python_script} ${other_args}, where:

With that, you should be able to get a traceback. Thanks!

MatthijsBurgh commented 2 years ago
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7db1859 in __GI_abort () at abort.c:79
#2  0x00007fffeec0a911 in  () at /lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007fffeec1638c in  () at /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007fffeec163f7 in  () at /lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007fffeec166a9 in  () at /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fffc31d8701 in void boost::throw_exception<boost::lock_error>(boost::lock_error const&)
    () at /opt/ros/noetic/lib/libqt_gui_cpp.so
#7  0x00007fffc232d624 in boost::mutex::lock() (this=<optimized out>)
    at /usr/include/c++/9/bits/char_traits.h:300
#8  boost::unique_lock<boost::mutex>::lock() (this=<synthetic pointer>)
    at /usr/include/boost/thread/lock_types.hpp:346
#9  boost::unique_lock<boost::mutex>::unique_lock(boost::mutex&) (m_=..., this=<synthetic pointer>)
    at /usr/include/boost/thread/lock_types.hpp:124
#10 ros::Time::now() () at ./src/time.cpp:230
#11 ros::Time::now() () at ./src/time.cpp:221
#12 0x00007fffc1f60101 in ros::Subscription::handleMessage(ros::SerializedMessage const&, bool, bool, boost::shared_ptr<std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > const&, boost::shared_ptr<ros::PublisherLink> const&) () at /opt/ros/noetic/lib/libroscpp.so
#13 0x00007fffc1f404d5 in ros::TransportPublisherLink::handleMessage(ros::SerializedMessage const&, bool, bool) () at /opt/ros/noetic/lib/libroscpp.so
#14 0x00007fffc1f40196 in ros::TransportPublisherLink::onMessage(boost::shared_ptr<ros::Connection> const&, boost::shared_array<unsigned char> const&, unsigned int, bool) ()
    at /opt/ros/noetic/lib/libroscpp.so
#15 0x00007fffc1ec9baf in ros::Connection::readTransport() () at /opt/ros/noetic/lib/libroscpp.so
#16 0x00007fffc1f3bc65 in ros::TransportTCP::socketUpdate(int) ()
    at /opt/ros/noetic/lib/libroscpp.so
#17 0x00007fffc1f6ff67 in ros::PollSet::update(int) () at /opt/ros/noetic/lib/libroscpp.so
#18 0x00007fffc1f01dbd in ros::PollManager::threadFunc() () at /opt/ros/noetic/lib/libroscpp.so
#19 0x00007fffc1dae43b in  () at /lib/x86_64-linux-gnu/libboost_thread.so.1.71.0
#20 0x00007ffff7d74609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#21 0x00007ffff7eae133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

So the segfault is comming from https://github.com/ros/roscpp_core/blob/72ce04f8b2849e0e4587ea4d598be6ec5d24d8ca/rostime/src/time.cpp#L230.

Though the rostime code, doesn't look that weird too me. So I think the real problem is somewhere else IMO.

ivanpauno commented 2 years ago

Though the rostime code, doesn't look that weird too me.

Global variables with non-trivial construction or destruction are harmful. Construction order issues can be easily solved with the "initialize on first use" pattern (which is not being used here). Destruction order issues can sometimes be solved, but it's more complex.

So I think the real problem is somewhere else IMO.

I think you're right. There seems to be still an executor running in a thread, we should stop it before closing. And for some reason that executor has a pointer to a subscription with a timer that was already destroyed. I haven't looked at the qt_gui_core, but if you find a thread spinning, that should be stopped and the thread joined before closing (or destroying other things).