Closed jonipol closed 2 years ago
@AlexeyMerzlyakov can you take this one? Looks like another related issue that you're most recently familiar with this part of the codebase.
This does seem a little odd to be problematic, to my knowledge, nothing has changed in this part of the code from ROS 1 navigation, and I don't recall even running into this wrt Costmap2D in ROS 1. But for all I know, this could be a long-standing problem.
Can you run some experiment in the default bringup / map in order to show this? That would help if we could reproduce ourselves. You can relocalize the robot into any position on the map
I've tried to reproduce the problem locally, but so far it's all works. @jonipol, could I ask you to share the results with default Nav2 bringup scripts? This indeed would be greatly helpful for us and allow to reproduce and understand the problem.
I'm sorry I have not had time to try this out yet. I will be returning from a work trip next week and will try to allocate some time to investigate this issue as it is really weird one. We just did migration from foxy to galactic, and this issue came up on the galactic branch but did not happen in Foxy one. Will update here once I manage to try this out.
@jonipol can you try on main
? Perhaps this is a bug that we already fixed in Rolling. @AlexeyMerzlyakov did you test with Galactic at tag 1.0.8
https://github.com/ros-planning/navigation2/releases/tag/1.0.8 to see if that's the case? I know you did some work on this recently that might not have been backport-able.
@AlexeyMerzlyakov did you test with Galactic at tag 1.0.8 https://github.com/ros-planning/navigation2/releases/tag/1.0.8 to see if that's the case?
Today I've re-built nav2 stack for this tag (in a VM with Galactic installed from deb-packages) and tried to reproduce the problem moving robot to the edge of both: global and local costmaps. In case if robot inside costmap boundaries, nothing happens. In case if robot is slightly out of costmap, appropriate server (controller or planner) reports this, but still no crashes or segfaults. I've also tried to move robot to the edges of global costmap having negative X or Y coordinates. Still no effect:
@jonipol, it is not finally clear what does "Move robot near the edge of costmap" mean? Does it mean robot is moving on the costmap using "2D pose esitmate" RViz command or is it physical move: immediate changing of robot coordinates in a Gazebo?
Sorry for unclear explanation. In this case both work, giving pose estimate and navigating the robot near edge of global costmap.
Today I managed to do some digging into this one. Tried the default config for global costmap. Corrected topic and was not able to get the crash. Did some comparisons and more testing. Found out that setting raytrace_min_range
> 0.0 (0.2 in my case) makes the crash happen for me.
@jonipol, thank you for pointing out raytrace_min_range
! Confirmed the problem for TB3 standard simulation on both: Galactic and Rolling releases. I'll continue investigation/debugging on next week and give more details as they will be appeared.
There are two versions of raytraceLine()
/bresenham()
algorithms. One is placed in nav2_voxel_grid/include/nav2_voxel_grid/voxel_grid.hpp
and used for 3D space voxel grids. Another - in nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d.hpp
which is used for 2D costmaps. The root cause of this problem is in 2D space and it is the same as was fixed in #2460 but for 3D version of Bresenham algorithm: incorrect component shift in length
calculation inside raytraceLine()
routine, which leads to negative shifts of costmap_
array for some cases.
So, all related bugfixes were (partially) back-ported from 3D raytraceLine()
to 2D version of algorithms in #2857. Local verification shown the problem disappeared on my side. @jonipol, could you please check: does the fix proposed work on your side?
Yes, the proposed fix does solve the issue on hand for me!
Bug report
Required Info:
Steps to reproduce issue
Expected behavior
Costmaps are generated, if goal is given the plan is made, execution continues normally
Actual behavior
Planner server crashes to segmentation fault
Additional information
So far I have managed to reproduce this crash on simulation on two different worlds and on our physical robot.
This segmentation fault seems to happen when the offset in costmap_2d.hpp, on bresenham2D function, is updated with offset_b being negative. This can lead to underflow of the unsigned int offset, which then leads to segmentation fault when
at(offset)
is called.For this issue I created a quick workaround simply by adding a if statement to lines 494 and 502. I did not yet have time to investigate this further and did not have chance to test this on rolling or with turtlebot for example.
Please let me know if more information, configurations or something is needed.