ros-visualization / rviz

ROS 3D Robot Visualizer
BSD 3-Clause "New" or "Revised" License
849 stars 463 forks source link

Fixed crash when raycasting while map is updated. #1793

Closed StefanFabian closed 1 year ago

StefanFabian commented 1 year ago

When using the scene manager raycast in one of our tool rviz crashed when the tool touched the map area while a new map was received. Apparently, the connection is not queued (anymore?) and therefore the update does not happen on the main thread. This small change should fix this and is fully ABI compatible.

Fixes #1792

rhaschke commented 1 year ago

Just to clarify: I trust you stating that this PR reliably fixes your issue, but I don't yet understand how/why. Maybe, using a queued connection, the update of map data structures is postponed after the current VisualizationManager::onUpdate() call, such that your tool can either still access the old data structures or will access the new ones in the next cycle? In any case, it would be helpful to see the actual backtrace. Thanks.

simonschmeisser commented 1 year ago

Do you use standalone rviz or some tool build upon it?

StefanFabian commented 1 year ago

I've used a blank default rviz config on my desktop where I don't have the fix. Just added a map display and our waypoint tool which uses the SelectionManager to get a raycasted 3D point from the mouse position in the processMouseEvent method and uses that 3D point to show a mesh at the location of the raycast.

Here's the backtrace:

Thread 1 "rviz" received signal SIGSEGV, Segmentation fault.
0x00000000400b0bf0 in ?? ()
(gdb) bt
#0  0x00000000400b0bf0 in  ()
#1  0x00007fffca651bf9 in  () at /lib/x86_64-linux-gnu/libnvidia-glcore.so.515.43.04
#2  0x00007fffca660985 in  () at /lib/x86_64-linux-gnu/libnvidia-glcore.so.515.43.04
#3  0x00007fffca2b93ce in  () at /lib/x86_64-linux-gnu/libnvidia-glcore.so.515.43.04
#4  0x00007fffc871ba7b in Ogre::GLRenderSystem::_render(Ogre::RenderOperation const&) ()
    at /usr/lib/x86_64-linux-gnu/OGRE-1.9.0/RenderSystem_GL.so.1.9.0
#5  0x00007ffff655a16e in Ogre::SceneManager::renderSingleObject(Ogre::Renderable*, Ogre::Pass const*, bool, bool, Ogre::HashedVector<Ogre::Light*> const*) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#6  0x00007ffff650a679 in Ogre::QueuedRenderableCollection::acceptVisitorGrouped(Ogre::QueuedRenderableVisitor*) const ()
    at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#7  0x00007ffff650a78c in Ogre::QueuedRenderableCollection::acceptVisitor(Ogre::QueuedRenderableVisitor*, Ogre::QueuedRenderableCollection::OrganisationMode) const () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#8  0x00007ffff654c93a in Ogre::SceneManager::renderBasicQueueGroupObjects(Ogre::RenderQueueGroup*, Ogre::QueuedRenderableCollection::OrganisationMode) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#9  0x00007ffff654c7d7 in Ogre::SceneManager::renderVisibleObjectsDefaultSequence() () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#10 0x00007ffff655e4a7 in Ogre::SceneManager::_renderScene(Ogre::Camera*, Ogre::Viewport*, bool) ()
    at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#11 0x00007ffff6398c3a in Ogre::Camera::_renderScene(Ogre::Viewport*, bool) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#12 0x00007ffff6524805 in Ogre::RenderTarget::_updateViewport(Ogre::Viewport*, bool) ()
    at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#13 0x00007ffff6524612 in Ogre::RenderTarget::_updateAutoUpdatedViewports(bool) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#14 0x00007ffff6524363 in Ogre::RenderTarget::updateImpl() () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#15 0x00007ffff6524d18 in Ogre::RenderTarget::update(bool) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#16 0x00007ffff650e90f in Ogre::RenderSystem::_updateAllRenderTargets(bool) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#17 0x00007ffff654551e in Ogre::Root::_updateAllRenderTargets() () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#18 0x00007ffff6545610 in Ogre::Root::renderOneFrame() () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#19 0x00007ffff7f3e120 in rviz::VisualizationManager::onUpdate() () at /opt/ros/noetic/lib/librviz.so
#20 0x00007ffff73d41d0 in QMetaObject::activate(QObject*, int, int, void**) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#21 0x00007ffff73e13ee in QTimer::timeout(QTimer::QPrivateSignal) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#22 0x00007ffff73d4bc5 in QObject::event(QEvent*) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#23 0x00007ffff77d5a66 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#24 0x00007ffff77df0f0 in QApplication::notify(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#25 0x00007ffff73a880a in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#26 0x00007ffff73ff780 in QTimerInfoList::activateTimers() () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#27 0x00007ffff740006c in  () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#28 0x00007ffff45f817d in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#29 0x00007ffff45f8400 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#30 0x00007ffff45f84a3 in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#31 0x00007ffff7400435 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) ()
    at /lib/x86_64-linux-gnu/libQt5Core.so.5
#32 0x00007ffff73a73ab in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#33 0x00007ffff73af116 in QCoreApplication::exec() () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#34 0x00005555555585e2 in main ()
StefanFabian commented 1 year ago

I can upload the whole code but the relevant portion should be this:

void WaypointTool::onInitialize()
{
  rviz::InteractionTool::onInitialize();
  preview_mesh_ = rviz::loadMeshFromResource( WAYPOINT_PREVIEW_RESOURCE );
  if ( preview_mesh_.isNull())
  {
    ROS_ERROR( "Failed to load waypoint tool preview mesh." );
    return;
  }

  arrow_mesh_ = rviz::loadMeshFromResource( WAYPOINT_ARROW_RESOURCE );
  if ( arrow_mesh_.isNull())
  {
    ROS_ERROR( "Failed to load waypoint tool orientation preview mesh." );
    return;
  }

  waypoint_preview_node_ = createWaypointNode();
  waypoint_preview_node_->setVisible( false );
}

Ogre::SceneNode *WaypointTool::createWaypointNode()
{
  Ogre::SceneNode *node = scene_manager_->getRootSceneNode()->createChildSceneNode();
  Ogre::Entity *entity = scene_manager_->createEntity( preview_mesh_ );
  node->attachObject( entity );
  node->setScale( WAYPOINT_PREVIEW_SCALE, WAYPOINT_PREVIEW_SCALE, WAYPOINT_PREVIEW_SCALE );
  Ogre::SceneNode *arrow_node = node->createChildSceneNode();
  entity = scene_manager_->createEntity( arrow_mesh_ );
  arrow_node->attachObject( entity );
  arrow_node->setVisible( false );
  arrow_node->setScale( 30 * WAYPOINT_PREVIEW_SCALE, 30 * WAYPOINT_PREVIEW_SCALE, 30 * WAYPOINT_PREVIEW_SCALE );
  return node;
}

int WaypointTool::processMouseEvent( rviz::ViewportMouseEvent &event )
{
  if ( waypoint_preview_node_ == nullptr ) return rviz::InteractionTool::processMouseEvent( event );

  // Perform raycast
  waypoint_preview_node_->setVisible( false ); // Disable for raycast
  Ogre::Vector3 intersection;
  if ( !raycast3d_property_->getBool() || !context_->getSelectionManager()->get3DPoint( event.viewport, event.x, event.y, intersection ))
  {
    // Fall back to plane intersection
    Ogre::Plane ground_plane( Ogre::Vector3::UNIT_Z, 0.f );
    if ( !rviz::getPointOnPlaneFromWindowXY( event.viewport, ground_plane, event.x, event.y, intersection ))
    {
      return Render;
    }
  }

  // Show preview node at current intersection unless we are in the process of placing a waypoint
  waypoint_preview_node_->setVisible( active_waypoint_ == nullptr, false );
  waypoint_preview_node_->setPosition( intersection );
  return rviz::InteractionTool::processMouseEvent( event );
}

I've removed the branches that are definitely not being hit from the processMouseEvent for easier readability.

rhaschke commented 1 year ago

I modified the PlantFlagTool example of rviz_plugin_tutorials to call context_->getSelectionManager()->get3DPoint instead of getPointOnPlaneFromWindowXY and this works w/o any issues. I'm afraid your issue is rooted in your source code.

StefanFabian commented 1 year ago

I just did the same and it crashes just like with our tool.

diff --git a/rviz_plugin_tutorials/src/plant_flag_tool.cpp b/rviz_plugin_tutorials/src/plant_flag_tool.cpp
index ea2689b..a8e73d0 100644
--- a/rviz_plugin_tutorials/src/plant_flag_tool.cpp
+++ b/rviz_plugin_tutorials/src/plant_flag_tool.cpp
@@ -38,6 +38,7 @@
 #include <rviz/mesh_loader.h>
 #include <rviz/geometry.h>
 #include <rviz/properties/vector_property.h>
+#include <rviz/selection/selection_manager.h>

 #include "plant_flag_tool.h"

@@ -169,9 +170,8 @@ int PlantFlagTool::processMouseEvent( rviz::ViewportMouseEvent& event )
   }
   Ogre::Vector3 intersection;
   Ogre::Plane ground_plane( Ogre::Vector3::UNIT_Z, 0.0f );
-  if( rviz::getPointOnPlaneFromWindowXY( event.viewport,
-                                         ground_plane,
-                                         event.x, event.y, intersection ))
+  moving_flag_node_->setVisible(false);
+  if( context_->getSelectionManager()->get3DPoint( event.viewport, event.x, event.y, intersection ))
   {
     moving_flag_node_->setVisible( true );
     moving_flag_node_->setPosition( intersection );

Do you also have an updating map? 1Hz should be sufficient. And moving the mouse across the map.

Same backtrace

Thread 1 "rviz" received signal SIGSEGV, Segmentation fault.
0x00000000400b0bf0 in ?? ()
(gdb) bt
#0  0x00000000400b0bf0 in  ()
#1  0x00007fffca651bf9 in  () at /lib/x86_64-linux-gnu/libnvidia-glcore.so.515.43.04
#2  0x00007fffca660985 in  () at /lib/x86_64-linux-gnu/libnvidia-glcore.so.515.43.04
#3  0x00007fffca2b93ce in  () at /lib/x86_64-linux-gnu/libnvidia-glcore.so.515.43.04
#4  0x00007fffc871ba7b in Ogre::GLRenderSystem::_render(Ogre::RenderOperation const&) ()
    at /usr/lib/x86_64-linux-gnu/OGRE-1.9.0/RenderSystem_GL.so.1.9.0
#5  0x00007ffff655a16e in Ogre::SceneManager::renderSingleObject(Ogre::Renderable*, Ogre::Pass const*, bool, bool, Ogre::HashedVector<Ogre::Light*> const*) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#6  0x00007ffff650a679 in Ogre::QueuedRenderableCollection::acceptVisitorGrouped(Ogre::QueuedRenderableVisitor*) const ()
    at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#7  0x00007ffff650a78c in Ogre::QueuedRenderableCollection::acceptVisitor(Ogre::QueuedRenderableVisitor*, Ogre::QueuedRenderableCollection::OrganisationMode) const () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#8  0x00007ffff654c93a in Ogre::SceneManager::renderBasicQueueGroupObjects(Ogre::RenderQueueGroup*, Ogre::QueuedRenderableCollection::OrganisationMode) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#9  0x00007ffff654c7d7 in Ogre::SceneManager::renderVisibleObjectsDefaultSequence() () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#10 0x00007ffff655e4a7 in Ogre::SceneManager::_renderScene(Ogre::Camera*, Ogre::Viewport*, bool) ()
    at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#11 0x00007ffff6398c3a in Ogre::Camera::_renderScene(Ogre::Viewport*, bool) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#12 0x00007ffff6524805 in Ogre::RenderTarget::_updateViewport(Ogre::Viewport*, bool) ()
    at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#13 0x00007ffff6524612 in Ogre::RenderTarget::_updateAutoUpdatedViewports(bool) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#14 0x00007ffff6524363 in Ogre::RenderTarget::updateImpl() () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#15 0x00007ffff6524d18 in Ogre::RenderTarget::update(bool) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#16 0x00007ffff650e90f in Ogre::RenderSystem::_updateAllRenderTargets(bool) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#17 0x00007ffff654551e in Ogre::Root::_updateAllRenderTargets() () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#18 0x00007ffff6545610 in Ogre::Root::renderOneFrame() () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#19 0x00007ffff7f3e120 in rviz::VisualizationManager::onUpdate() () at /opt/ros/noetic/lib/librviz.so
#20 0x00007ffff73d41d0 in QMetaObject::activate(QObject*, int, int, void**) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#21 0x00007ffff73e13ee in QTimer::timeout(QTimer::QPrivateSignal) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#22 0x00007ffff73d4bc5 in QObject::event(QEvent*) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#23 0x00007ffff77d5a66 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#24 0x00007ffff77df0f0 in QApplication::notify(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#25 0x00007ffff73a880a in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#26 0x00007ffff73ff780 in QTimerInfoList::activateTimers() () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#27 0x00007ffff740006c in  () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#28 0x00007ffff45f817d in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#29 0x00007ffff45f8400 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#30 0x00007ffff45f84a3 in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#31 0x00007ffff7400435 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#32 0x00007ffff73a73ab in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#33 0x00007ffff73af116 in QCoreApplication::exec() () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#34 0x00005555555585e2 in main ()
rhaschke commented 1 year ago

I used the moving (but planar) map from map_test.py. Does your map strongly changes size between updates? Could you please provide a bagfile of your map topic?

StefanFabian commented 1 year ago

Here's a 30s bag file 2023-05-08-19-21-33.zip. I've also created a new workspace only overlaying the system ros path to exclude the possibility of our other packages interfering. Same result, after a few seconds of moving the mouse over the map it crashes.

rhaschke commented 1 year ago

Hm. Still can't reproduce your segfault. I guess you use the latest rviz from source? Please also provide your OpenGL hardware details as printed by rviz on startup.

rhaschke commented 1 year ago

Still can't reproduce your segfault. I guess you use the latest rviz from source?

Same result with latest Noetic release. I suggest you prepare a self-contained environment (e.g. a docker image) to reliably reproduce the problem.

StefanFabian commented 1 year ago

Hm. Still can't reproduce your segfault. I guess you use the latest rviz from source? Please also provide your OpenGL hardware details as printed by rviz on startup.

The issue contains the details for my laptop. My desktop:

[ INFO] [1683566557.838896663]: rviz version 1.14.20
[ INFO] [1683566557.838937414]: compiled against Qt version 5.12.8
[ INFO] [1683566557.838947344]: compiled against OGRE version 1.9.0 (Ghadamon)
[ INFO] [1683566557.845395627]: Forcing OpenGl version 0.
[ INFO] [1683566558.512378558]: Stereo is NOT SUPPORTED
[ INFO] [1683566558.512423668]: OpenGL device: NVIDIA GeForce RTX 3070/PCIe/SSE2
[ INFO] [1683566558.512435728]: OpenGl version: 4.6 (GLSL 4.6).

Nvidia driver versions are 530 on the laptop 3060, and 515 on my desktop.

Still can't reproduce your segfault. I guess you use the latest rviz from source?

Same result with latest Noetic release. I suggest you prepare a self-contained environment (e.g. a docker image) to reliably reproduce the problem.

That seems like a lot of work for a minor change, I can also just continue using our fork if you think this change could have negative consequences.

rhaschke commented 1 year ago

We have seen many segfaults due to OpenGL rendering issues on NVIDIA hardware: https://github.com/ros-visualization/rviz/issues?q=is%3Aissue+is%3Aopen+label%3Aopengl

On your laptop, you probably also have an integrated Intel GPU. Could you please check whether the bug persists on this hardware? To this end use the environment variables:

__NV_PRIME_RENDER_OFFLOAD=1
__GLX_VENDOR_LIBRARY_NAME=(mesa|nvidia)

Logically, the proposed change should not be necessary. If it is for you, I would like to understand why before merging. As I said, this change introduces a one-frame delay, which should be avoided if possible.

rhaschke commented 1 year ago

I can reproduce the segfault with NVIDIA hardware. I will try to understand the problem now...

StefanFabian commented 1 year ago

Any success at finding the cause? If you think a 33ms delay (at the default 30fps limit) is that much of an issue, we could also switch to the nh_ that is not running on the main thread.

rhaschke commented 1 year ago

I didn't find the root cause of the issue. I agree to merge this work-around. Please merge the suggestion I added before: https://github.com/ros-visualization/rviz/pull/1793#discussion_r1187262549