ros-navigation / navigation2

ROS 2 Navigation Framework and System
https://nav2.org/
Other
2.63k stars 1.31k forks source link

rviz2 Docking Panel plugin randomly crashes when system is under high load #4689

Closed azeey closed 1 month ago

azeey commented 2 months ago

Bug report

Required Info:

We've been experiencing a lot of random crashes while preparing the Gazebo Ionic demo that features Nav2 (see https://github.com/gazebosim/ionic_demo). It seems to be related to system load as it occurred more frequently when I was on a video call testing out the demo.

Steps to reproduce issue

  1. Run stress to create high load on your machine. I did stress -c 16 -m 8 on my laptop 16 cores, 32GB RAM
  2. ros2 launch nav2_bringup tb4_simulation_launch.py headless:=False
    • You might have to run this a few times depending on your system

Expected behavior

rviz2 runs without issues

Actual behavior

rviz will start and crash immediately.

The backtrace from a core dump points to DockingPanel

#0  0x0000790b703b9bdb in rclcpp::ParameterValue::get<(rclcpp::ParameterType)9> (this=0x20) at /usr/src/ros-rolling-rclcpp-28.3.3-1noble.20240729.171300/include/rclcpp/parameter_value.hpp:244
#1  rclcpp::Parameter::get_value<(rclcpp::ParameterType)9> (this=0x0) at /usr/src/ros-rolling-rclcpp-28.3.3-1noble.20240729.171300/include/rclcpp/parameter.hpp:119
#2  rclcpp::Parameter::as_string_array[abi:cxx11]() const (this=0x0) at /usr/src/ros-rolling-rclcpp-28.3.3-1noble.20240729.171300/src/rclcpp/parameter.cpp:141
#3  0x0000790b08d617dd in nav2_rviz_plugins::pluginLoader(std::shared_ptr<rclcpp::Node>, bool&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, QComboBox*) () at /root/ws/install/nav2_rviz_plugins/lib/libnav2_rviz_plugins.so
#4  0x0000790b08c8e2d3 in nav2_rviz_plugins::DockingPanel::timerEvent(QTimerEvent*) () at /root/ws/install/nav2_rviz_plugins/lib/libnav2_rviz_plugins.so
#5  0x0000790b707b924b in QObject::event(QEvent*) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#6  0x0000790b70b91d45 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#7  0x0000790b7078b118 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#8  0x0000790b707e75ab in QTimerInfoList::activateTimers() () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#9  0x0000790b707e7f11 in ??? () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#10 0x0000790b6e88e5b5 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x0000790b6e8ed717 in ??? () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#12 0x0000790b6e88da53 in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#13 0x0000790b707e8279 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#14 0x0000790b70789a7b in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#15 0x0000790b707923e8 in QCoreApplication::exec() () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#16 0x000063239d56bc97 in main (argc=6, argv=0x7ffd9082f8c8) at /usr/src/ros-rolling-rviz2-14.2.5-1noble.20240820.020548/src/main.cpp:92

Additional information

The crash doesn't seem to happen if I disable use_composition.

ajtudela commented 2 months ago

It seems to be related to the timerEvent that waits until the docking_server is up and it loads the plugins. Your docking server is not running when it crashes, right?

The plugins loader is also used in the SelectorPanel, does this happen to you when you enable the SelectorPanel?

SteveMacenski commented 2 months ago

@azeey can you respond to @ajtudela's request for info? He's the original author of that panel and knows it best to solve the issue with some info.

azeey commented 2 months ago

Sorry, this slipped my mind. Last I checked, it didn't happen with the SelectorPanel enabled as long as the Docking panel is disabled. I can check again tomorrow if you'd like.

SteveMacenski commented 2 months ago

Thanks!

ajtudela commented 1 month ago

I'm trying to reproduce the crash with my setup (Ubuntu 24.04, rolling, main) without success. Sometimes, when the cpu is under stress, rviz hangs for a few seconds, but it recovers.

However, I'm working on an improved state machine for the panel that will hopefully fix this and this: https://github.com/ros-navigation/navigation2/pull/4458#issuecomment-2297700073

ajtudela commented 1 month ago

I was a race condition, difficult to catch, but I fixed!

@SteveMacenski could you check this branch: https://github.com/ajtudela/navigation2/tree/improve_panel using the new non-charging dock to check there is no issues?

Thanks

SteveMacenski commented 1 month ago

Software-wise it looks good! A few nits like when run() waiting on the action server, log something to let the user know its waiting on something

Does this solve the crash? If so, I can test the state machine, but I trust @ajtudela did this well :smile:

SteveMacenski commented 1 month ago

4717 resolves