ros2 / rviz

ROS 3D Robot Visualizer
BSD 3-Clause Clear License
267 stars 201 forks source link

RViz crashes when selecting point clouds with decay enabled #322

Open greimela-si opened 6 years ago

greimela-si commented 6 years ago

We experienced a crash in RViz when selecting pointclouds while using decay.

sefault_on_select

Steps to reproduce the crash:

We already tried to identify the origin and ended up somewhere in SelectionManager::pick(). When selecting a pointcloud the selection manager uses two passes to first select the pointcloud and in a second pass the individual points inside one pointcloud.

In this second render pass, each point is colored based on its index inside the pointcloud. The crash occurs because some points are rendered in white instead and the selection manager tries to match this color to an non existing index.

We assume that this originates in the fact that some pointclouds are rendered at the exact same position.

botteroa-si commented 6 years ago

To support our hypothesis that this behaviour depends on the fact that there are more points in the exact same position, and that, therefore, the not null decay time is only an indirect cause of the phenomenon, we tried to reproduce the segfault publishing two small (just 1 point) identical point clouds and repeatedly selecting them. As expected, in the selection panel, only one point resulted selected, and after some tries RViz has crashed exactly as described in the previous comment.

This reinforces the hypothesis that, when points are at the exact same position, sometimes the point seen in the second render pass is not the same that was selected in the first one.

audrow commented 3 years ago

It seems that the index of the handle in PointCloudSelectionHandler::onSelect is invalid. When the error occurs the index is a large number (e.g., 16777214). When this error isn't thrown, the index is normally a low number, like 0. Below is some digging that I've done into the issue.

Following the code
It seems the code crashes on line 171, in `PointCloudSelectionHandler::onSelect`.
ASan output (not more helpful than a backtrace)
AddressSanitizer:DEADLYSIGNAL
=================================================================
==8591==ERROR: AddressSanitizer: SEGV on unknown address 0x60301c1940d8 (pc 0x7f6c604bd955 bp 0x7fffc3968a60 sp 0x7fffc3968840 T0)
==8591==The signal is caused by a READ memory access.
    #0 0x7f6c604bd954 in rviz_default_plugins::PointCloudSelectionHandler::onSelect(rviz_common::interaction::Picked const&) /root/ros2_ws/src/ros2/rviz/rviz_default_plugins/src/rviz_default_plugins/displays/pointcloud/point_cloud_selection_handler.cpp:171
    #1 0x7f6c7d923008 in rviz_common::interaction::SelectionManager::addSelectedObject(rviz_common::interaction::Picked const&) /root/ros2_ws/src/ros2/rviz/rviz_common/src/rviz_common/interaction/selection_manager.cpp:445
    #2 0x7f6c7d92259c in rviz_common::interaction::SelectionManager::addSelection(std::unordered_map, std::equal_to, std::allocator > > const&) /root/ros2_ws/src/ros2/rviz/rviz_common/src/rviz_common/interaction/selection_manager.cpp:417
    #3 0x7f6c7d922ae5 in rviz_common::interaction::SelectionManager::setSelection(std::unordered_map, std::equal_to, std::allocator > > const&) /root/ros2_ws/src/ros2/rviz/rviz_common/src/rviz_common/interaction/selection_manager.cpp:433
    #4 0x7f6c7d925a60 in rviz_common::interaction::SelectionManager::select(rviz_rendering::RenderWindow*, int, int, int, int, rviz_common::interaction::SelectionManagerIface::SelectType) /root/ros2_ws/src/ros2/rviz/rviz_common/src/rviz_common/interaction/selection_manager.cpp:579
    #5 0x7f6c60660704 in rviz_default_plugins::tools::SelectionTool::processMouseEvent(rviz_common::ViewportMouseEvent&) /root/ros2_ws/src/ros2/rviz/rviz_default_plugins/src/rviz_default_plugins/tools/select/selection_tool.cpp:148
    #6 0x7f6c7da0d776 in rviz_common::VisualizationManager::handleMouseEvent(rviz_common::ViewportMouseEvent const&) /root/ros2_ws/src/ros2/rviz/rviz_common/src/rviz_common/visualization_manager.cpp:651
    #7 0x7f6c7d8efee3 in rviz_common::RenderPanel::onRenderWindowMouseEvents(QMouseEvent*) /root/ros2_ws/src/ros2/rviz/rviz_common/src/rviz_common/render_panel.cpp:195
    #8 0x7f6c7d8f3eaf in void std::__invoke_impl(std::__invoke_memfun_deref, void (rviz_common::RenderPanel::*&)(QMouseEvent*), rviz_common::RenderPanel*&, QMouseEvent*&&) /usr/include/c++/9/bits/invoke.h:73
    #9 0x7f6c7d8f39d2 in std::__invoke_result::type std::__invoke(void (rviz_common::RenderPanel::*&)(QMouseEvent*), rviz_common::RenderPanel*&, QMouseEvent*&&) /usr/include/c++/9/bits/invoke.h:95
    #10 0x7f6c7d8f3561 in void std::_Bind))(QMouseEvent*)>::__call(std::tuple&&, std::_Index_tuple<0ul, 1ul>) /usr/include/c++/9/functional:400
    #11 0x7f6c7d8f2ccc in void std::_Bind))(QMouseEvent*)>::operator()(QMouseEvent*&&) /usr/include/c++/9/functional:484
    #12 0x7f6c7d8f21a8 in std::_Function_handler))(QMouseEvent*)> >::_M_invoke(std::_Any_data const&, QMouseEvent*&&) /usr/include/c++/9/bits/std_function.h:300
    #13 0x7f6c7af26828 in std::function::operator()(QMouseEvent*) const /usr/include/c++/9/bits/std_function.h:688
    #14 0x7f6c7af26326 in rviz_rendering::RenderWindow::event(QEvent*) /root/ros2_ws/src/ros2/rviz/rviz_rendering/src/rviz_rendering/render_window.cpp:175
    #15 0x7f6c7cf52a65 in QApplicationPrivate::notify_helper(QObject*, QEvent*) (/lib/x86_64-linux-gnu/libQt5Widgets.so.5+0x16aa65)
    #16 0x7f6c7cf5c0ef in QApplication::notify(QObject*, QEvent*) (/lib/x86_64-linux-gnu/libQt5Widgets.so.5+0x1740ef)
    #17 0x7f6c7cb25939 in QCoreApplication::notifyInternal2(QObject*, QEvent*) (/lib/x86_64-linux-gnu/libQt5Core.so.5+0x286939)
    #18 0x7f6c7a7677d2 in QGuiApplicationPrivate::processMouseEvent(QWindowSystemInterfacePrivate::MouseEvent*) (/lib/x86_64-linux-gnu/libQt5Gui.so.5+0x1267d2)
    #19 0x7f6c7a76910a in QGuiApplicationPrivate::processWindowSystemEvent(QWindowSystemInterfacePrivate::WindowSystemEvent*) (/lib/x86_64-linux-gnu/libQt5Gui.so.5+0x12810a)
    #20 0x7f6c7a74335a in QWindowSystemInterface::sendWindowSystemEvents(QFlags) (/lib/x86_64-linux-gnu/libQt5Gui.so.5+0x10235a)
    #21 0x7f6c721e432d  (/lib/x86_64-linux-gnu/libQt5XcbQpa.so.5+0x7932d)
    #22 0x7f6c79eeefbc in g_main_context_dispatch (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x51fbc)
    #23 0x7f6c79eef23f  (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x5223f)
    #24 0x7f6c79eef2e2 in g_main_context_iteration (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x522e2)
    #25 0x7f6c7cb7d564 in QEventDispatcherGlib::processEvents(QFlags) (/lib/x86_64-linux-gnu/libQt5Core.so.5+0x2de564)
    #26 0x7f6c7cb244da in QEventLoop::exec(QFlags) (/lib/x86_64-linux-gnu/libQt5Core.so.5+0x2854da)
    #27 0x7f6c7cb2c245 in QCoreApplication::exec() (/lib/x86_64-linux-gnu/libQt5Core.so.5+0x28d245)
    #28 0x55bff484543e in main /root/ros2_ws/src/ros2/rviz/rviz2/src/main.cpp:76
    #29 0x7f6c7bcdf0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
    #30 0x55bff48437bd in _start (/root/ros2_ws/build/rviz2/rviz2+0x77bd)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /root/ros2_ws/src/ros2/rviz/rviz_default_plugins/src/rviz_default_plugins/displays/pointcloud/point_cloud_selection_handler.cpp:171 in rviz_default_plugins::PointCloudSelectionHandler::onSelect(rviz_common::interaction::Picked const&)
==8591==ABORTING
Here is the guilty function. The program crashes because the index isn't always valid when selecting multiple co-located points. https://github.com/ros2/rviz/blob/d59072192205a92acd9bc84605420a815c885d00/rviz_default_plugins/src/rviz_default_plugins/displays/pointcloud/point_cloud_selection_handler.cpp#L164-L171 As far as I can tell, `obj` in the above code is created in the `pick` method in the `SelectionManager::select` method. https://github.com/ros2/rviz/blob/d59072192205a92acd9bc84605420a815c885d00/rviz_common/src/rviz_common/interaction/selection_manager.cpp#L563-L581 The logic in the `pick` method is complicated, but it seems that the results are set here, while iterating through the `pixel_buffer_`. https://github.com/ros2/rviz/blob/d59072192205a92acd9bc84605420a815c885d00/rviz_common/src/rviz_common/interaction/selection_manager.cpp#L619-L620 Here are the `results` after `pick` for a selection that does cause an error: ``` (rr) p results $15 = std::unordered_map with 1 element = {[8437760] = {handle = 8437760, pixel_count = 4959, extra_handles = std::set with 1 element = {[0] = 16777215}}} ``` and is an example of `results` after `pick` for a selection that doesn't raise an error: ``` (rr) p results $4 = std::unordered_map with 1 element = {[24704] = {handle = 24704, pixel_count = 4959, extra_handles = std::set with 1 element = { [0] = 1}}} ``` If I understand correctly, the `results` are just copying values from the `pixel_buffer_`'s elements, which means that the bad index is created further up. The `pixel_buffer_` appears to be updated in `SelectionManager::unpackColors`, which creates the handle in the `colorToHandle` method. https://github.com/ros2/rviz/blob/d59072192205a92acd9bc84605420a815c885d00/rviz_common/include/rviz_common/interaction/forwards.hpp#L72-L84 I'm not sure what's going wrong here.
audrow commented 3 years ago

Here is an updated comment on looking through the code.

Here is where I believe that I have traced the error to
#0  Ogre::GLTextureBuffer::download (this=0x55d0a4be56e0, data=...)
    at /root/ros2_ws/build/rviz_ogre_vendor/ogre-v1.12.1-prefix/src/ogre-v1.12.1/RenderSystems/GL/src/OgreGLHardwarePixelBuffer.cpp:370
#1  0x00007f1d15b9ae06 in Ogre::GLHardwarePixelBuffer::blitToMemory (this=0x55d0a4be56e0, 
    srcBox=..., dst=...)
    at /root/ros2_ws/build/rviz_ogre_vendor/ogre-v1.12.1-prefix/src/ogre-v1.12.1/RenderSystems/GL/src/OgreGLHardwarePixelBuffer.cpp:110
#2  0x00007f1d3fa229b7 in rviz_common::interaction::SelectionRenderer::blitToMemory (
    this=0x55d0a458ee40, pixel_buffer=..., render_viewport=0x55d0a55ca580, dst_box=...)
    at /root/ros2_ws/src/ros2/rviz/rviz_common/src/rviz_common/interaction/selection_renderer.cpp:272
#3  0x00007f1d3fa21d13 in rviz_common::interaction::SelectionRenderer::render (this=0x55d0a458ee40, 
    window=0x55d0a35a0560, rectangle=..., texture=..., handlers=..., dst_box=...)
    at /root/ros2_ws/src/ros2/rviz/rviz_common/src/rviz_common/interaction/selection_renderer.cpp:113
#4  0x00007f1d3fa141b5 in rviz_common::interaction::SelectionManager::render (this=0x55d0a45914b0, 
    window=0x55d0a35a0560, selection_rectangle=..., render_texture=..., dst_box=...)
    at /root/ros2_ws/src/ros2/rviz/rviz_common/src/rviz_common/interaction/selection_manager.cpp:310
#5  0x00007f1d3fa13ff2 in rviz_common::interaction::SelectionManager::renderAndUnpack (
    this=0x55d0a45914b0, window=0x55d0a35a0560, selection_rectangle=..., pass=0)
    at /root/ros2_ws/src/ros2/rviz/rviz_common/src/rviz_common/interaction/selection_manager.cpp:299
#6  0x00007f1d3fa15d07 in rviz_common::interaction::SelectionManager::pick (this=0x55d0a45914b0, 
    window=0x55d0a35a0560, x1=474, y1=506, x2=513, y2=535, 
    results=std::unordered_map with 0 elements)
    at /root/ros2_ws/src/ros2/rviz/rviz_common/src/rviz_common/interaction/selection_manager.cpp:606
#7  0x00007f1d3fa15b23 in rviz_common::interaction::SelectionManager::select (this=0x55d0a45914b0, 
    window=0x55d0a35a0560, x1=474, y1=506, x2=513, y2=535, 
    type=rviz_common::interaction::SelectionManagerIface::Replace)
    at /root/ros2_ws/src/ros2/rviz/rviz_common/src/rviz_common/interaction/selection_manager.cpp:572
#8  0x00007f1d074c239e in rviz_default_plugins::tools::SelectionTool::processMouseEvent (
    this=0x55d0a4608a50, event=...)
    at /root/ros2_ws/src/ros2/rviz/rviz_default_plugins/src/rviz_default_plugins/tools/select/selection_tool.cpp:148
#9  0x00007f1d3fa7443a in rviz_common::VisualizationManager::handleMouseEvent (this=0x55d0a35a51a0, 
    vme=...)
    at /root/ros2_ws/src/ros2/rviz/rviz_common/src/rviz_common/visualization_manager.cpp:651
#10 0x00007f1d3f9fdaed in rviz_common::RenderPanel::onRenderWindowMouseEvents (this=0x55d0a363a150, 
    event=0x7ffece5215e0)
    at /root/ros2_ws/src/ros2/rviz/rviz_common/src/rviz_common/render_panel.cpp:195
#11 0x00007f1d3f9ffc40 in std::__invoke_impl (__f=
    @0x55d0a3c2a1b0: (void (rviz_common::RenderPanel::*)(rviz_common::RenderPanel * const, QMouseEvent *)) 0x7f1d3f9fda2c , 
    __t=@0x55d0a3c2a1c0: 0x55d0a363a150) at /usr/include/c++/9/bits/invoke.h:73
#12 0x00007f1d3f9ff954 in std::__invoke (__fn=
    @0x55d0a3c2a1b0: (void (rviz_common::RenderPanel::*)(rviz_common::RenderPanel * const, QMouseEvent *)) 0x7f1d3f9fda2c )
    at /usr/include/c++/9/bits/invoke.h:95
#13 0x00007f1d3f9ff69c in std::_Bind))(QMouseEvent*)>::__call(std::tuple&&, std::_Index_tuple<0ul, 1ul>) (this=0x55d0a3c2a1b0, __args=...)
--Type  for more, q to quit, c to continue without paging--c
    at /usr/include/c++/9/functional:400
#14 0x00007f1d3f9ff284 in std::_Bind))(QMouseEvent*)>::operator()(QMouseEvent*&&) (this=0x55d0a3c2a1b0) at /usr/include/c++/9/functional:484
#15 0x00007f1d3f9fed12 in std::_Function_handler))(QMouseEvent*)> >::_M_invoke(std::_Any_data const&, QMouseEvent*&&) (__functor=..., __args#0=@0x7ffece521220: 0x7ffece5215e0) at /usr/include/c++/9/bits/std_function.h:300
#16 0x00007f1d3d2e4829 in std::function::operator()(QMouseEvent*) const (this=0x55d0a35a0590, __args#0=0x7ffece5215e0) at /usr/include/c++/9/bits/std_function.h:688
#17 0x00007f1d3d2e4327 in rviz_rendering::RenderWindow::event (this=0x55d0a35a0560, event=0x7ffece5215e0) at /root/ros2_ws/src/ros2/rviz/rviz_rendering/src/rviz_rendering/render_window.cpp:175
#18 0x00007f1d3f1b4a66 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () from /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#19 0x00007f1d3f1be0f0 in QApplication::notify(QObject*, QEvent*) () from /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#20 0x00007f1d3ed8793a in QCoreApplication::notifyInternal2(QObject*, QEvent*) () from /lib/x86_64-linux-gnu/libQt5Core.so.5
#21 0x00007f1d3cb237d3 in QGuiApplicationPrivate::processMouseEvent(QWindowSystemInterfacePrivate::MouseEvent*) () from /lib/x86_64-linux-gnu/libQt5Gui.so.5
#22 0x00007f1d3cb2510b in QGuiApplicationPrivate::processWindowSystemEvent(QWindowSystemInterfacePrivate::WindowSystemEvent*) () from /lib/x86_64-linux-gnu/libQt5Gui.so.5
#23 0x00007f1d3caff35b in QWindowSystemInterface::sendWindowSystemEvents(QFlags) () from /lib/x86_64-linux-gnu/libQt5Gui.so.5
#24 0x00007f1d37cd032e in ?? () from /lib/x86_64-linux-gnu/libQt5XcbQpa.so.5
#25 0x00007f1d3c15bfbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#26 0x00007f1d3c15c240 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#27 0x00007f1d3c15c2e3 in g_main_context_iteration () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#28 0x00007f1d3eddf565 in QEventDispatcherGlib::processEvents(QFlags) () from /lib/x86_64-linux-gnu/libQt5Core.so.5
#29 0x00007f1d3ed864db in QEventLoop::exec(QFlags) () from /lib/x86_64-linux-gnu/libQt5Core.so.5
#30 0x00007f1d3ed8e246 in QCoreApplication::exec() () from /lib/x86_64-linux-gnu/libQt5Core.so.5
#31 0x000055d0a1cdf104 in main (argc=1, argv=0x7ffece521c78) at /root/ros2_ws/src/ros2/rviz/rviz2/src/main.cpp:76

I believe the problem comes from the download function in GLHardwarePixelBuffer::blitToMemory. This function gets data that ultimately is put into the selection_managers pixel_buffer_ in SelectionManager::unpackColors. This occurs in SelectionManager::pick in the first call of renderAndUnpack, which sets every entry in renderAndUnpack every entry in pixel_buffer_ is 7389312.

p pixel_buffer_
$58 = std::vector of length 1131, capacity 37349 = {7389312, 7389312, 7389312, 7389312, 7389312, 7389312, 7389312, 7389312, 7389312, 
  7389312, 7389312, 7389312, 7389312, 7389312, 7389312, 7389312, 7389312, 7389312...}

The second call of renderAndUnpack sets every entry in the pixel_buffer_ to 16777215. Note, that 16777215 is equal to 2^24-1, which is 24 1s in binary. Thus the pixel_buffer_ seems to be all ones at this point.

p pixel_buffer_
$63 = std::vector of length 1131, capacity 37349 = {16777215, 16777215, 16777215, 16777215, 16777215, 16777215, 16777215, 16777215, 
  16777215, 16777215, 16777215, 16777215, 16777215, 16777215, 16777215, 16777215, 16777215, 16777215...}
Here's what happens when selecting a point cloud doesn't cause a crash By contrast, when a point cloud object was selected and the program didn't crash, here is the value of `pixel_buffer_` after the first `renderAndUnpack`.
p pixel_buffer_
$47 = std::vector of length 1, capacity 37349 = {0}
And the value of `pixel_buffer_` after the second `renderAndUnpack` function call in `SelectionManager::pick`. Later for `extra_handles` in the `M_Picked` object the `0`s will be excluded and the `1`s will have `1` subtracted from them so they point to the `0`th index.
p pixel_buffer_
$49 = std::vector of length 18382, capacity 37349 = {0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  1, 1...}

The pixel_buffer_ is used to set the variable extra_by_pixel. I believe extra_by_pixel will always be the same as pixel_buffer_ for point clouds because all of the possible values should be added by to need_additional (a set of unique pixel values) and there is only one rendering pass (the program compiles and runs fine when pass is made constant). Then the value 16777215 is inserted into the extra_handles member of the returned results object.

The results object is then passed to the SelectionManager's setSelection then addSelection then addSelectedObject then PointCloudSelectionHandler::onSelect where one is subtracted from 16777215 and that is the index attempted to be accessed in cloud_info_->transformed_points_, which has only one item in it.

I believe that the error begins from the return of GLHardwarePixelBuffer::download when two co-located points are selected. I am not sure how to proceed from here.

audrow commented 3 years ago

@martin-idel, I see you had a PR migrating the selection tool, do you have any thoughts on why SelectionManager::renderAndUnpack would be setting the pixel_buffer_ to all ones when selecting co-located points?

Martin-Idel commented 3 years ago

I remember when we had a look at this - the whole selection is quite tricky, because instead of using standard Ogre selection mechanisms RViz ships its own. The high level overview, if I remember correctly is the following:

If I remember correctly we thought the problem was the renderables which are not selected on the first render pass. They will be colored black (or white?) entirely and shouldn't occur in the pixel_buffer_. But due to z-fighting, this is something that can happen after all, and that's where a crash occurs. Z-fighting is an issue, because selection doesn't occur with one rendering pass but instead requires a second render, so we didn't really know what to do here and didn't investigate further...

However, that doesn't really fit together with your investigation of setting everything to ones instead of zeros when no crash occurs. I'll try to have a closer look again soon, but it's probably going to take a bit.