ros-navigation / navigation2

ROS 2 Navigation Framework and System
https://nav2.org/
Other
2.6k stars 1.3k forks source link

Underflow in costmaps lead to segmentation fault in planner server #2835

Closed jonipol closed 2 years ago

jonipol commented 2 years ago

Bug report

Required Info:

Steps to reproduce issue

1. Have obstacle layer in global costmap with laser scan
2. Move robot near the edge of costmap (I have got this error when robot is near the edge of costmap on position with negative y coordinates.)
3. Planner might crash

Expected behavior

Costmaps are generated, if goal is given the plan is made, execution continues normally

Actual behavior

Planner server crashes to segmentation fault

Additional information

So far I have managed to reproduce this crash on simulation on two different worlds and on our physical robot. image

This segmentation fault seems to happen when the offset in costmap_2d.hpp, on bresenham2D function, is updated with offset_b being negative. This can lead to underflow of the unsigned int offset, which then leads to segmentation fault when at(offset) is called.

[planner_server-19] Stack trace (most recent call last) in thread 2892:
[planner_server-19] #20   Object "", at 0xffffffffffffffff, in 
[planner_server-19] #19   Source "../sysdeps/unix/sysv/linux/x86_64/clone.S", line 95, in __clone [0x7f81a1b2a292]
[planner_server-19] #18   Source "/build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c", line 477, in start_thread [0x7f81a1899608]
[planner_server-19] #17   Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28", at 0x7f81a1cedde3, in 
[planner_server-19] #16   Source "/usr/include/c++/9/thread", line 195, in _M_run [0x7f81a16bf509]
[planner_server-19]         192:    { }
[planner_server-19]         193: 
[planner_server-19]         194:    void
[planner_server-19]       > 195:    _M_run() { _M_func(); }
[planner_server-19]         196:       };
[planner_server-19]         197: 
[planner_server-19]         198:     void
[controller_server-18] [WARN] [1645688710.606256448] [local_costmap.local_costmap]: /camera_4/slope_filtered_3d buffer updated in 0.56s, it should be updated every 0.50s.
[planner_server-19] #15   Source "/usr/include/c++/9/thread", line 251, in operator() [0x7f81a16c3837]
[planner_server-19]         248:    {
[planner_server-19]         249:      using _Indices
[planner_server-19]         250:        = typename _Build_index_tuple<tuple_size<_Tuple>::value>::__type;
[planner_server-19]       > 251:      return _M_invoke(_Indices());
[planner_server-19]         252:    }
[planner_server-19]         253:       };
[planner_server-19] #14   Source "/usr/include/c++/9/thread", line 244, in _M_invoke<0> [0x7f81a16c8cdd]
[planner_server-19]         241:    template<size_t... _Ind>
[planner_server-19]         242:      typename __result<_Tuple>::type
[planner_server-19]         243:      _M_invoke(_Index_tuple<_Ind...>)
[planner_server-19]       > 244:      { return std::__invoke(std::get<_Ind>(std::move(_M_t))...); }
[planner_server-19]         245: 
[planner_server-19]         246:    typename __result<_Tuple>::type
[planner_server-19]         247:    operator()()
[planner_server-19] #13   Source "/usr/include/c++/9/bits/invoke.h", line 95, in __invoke<std::_Bind<void (nav2_costmap_2d::Costmap2DROS::*(nav2_costmap_2d::Costmap2DROS*, double))(double)> > [0x7f81a16cb062]
[planner_server-19]          92:       using __result = __invoke_result<_Callable, _Args...>;
[planner_server-19]          93:       using __type = typename __result::type;
[planner_server-19]          94:       using __tag = typename __result::__invoke_type;
[planner_server-19]       >  95:       return std::__invoke_impl<__type>(__tag{}, std::forward<_Callable>(__fn),
[planner_server-19]          96:                    std::forward<_Args>(__args)...);
[planner_server-19]          97:     }
[planner_server-19] #12   Source "/usr/include/c++/9/bits/invoke.h", line 60, in __invoke_impl<void, std::_Bind<void (nav2_costmap_2d::Costmap2DROS::*(nav2_costmap_2d::Costmap2DROS*, double))(double)> > [0x7f81a16cc81e]
[planner_server-19]          57:   template<typename _Res, typename _Fn, typename... _Args>
[planner_server-19]          58:     constexpr _Res
[planner_server-19]          59:     __invoke_impl(__invoke_other, _Fn&& __f, _Args&&... __args)
[planner_server-19]       >  60:     { return std::forward<_Fn>(__f)(std::forward<_Args>(__args)...); }
[planner_server-19]          61: 
[planner_server-19]          62:   template<typename _Res, typename _MemFun, typename _Tp, typename... _Args>
[planner_server-19]          63:     constexpr _Res
[planner_server-19] #11   Source "/usr/include/c++/9/functional", line 484, in operator()<> [0x7f81a16cda4e]
[planner_server-19]         481:    {
[planner_server-19]         482:      return this->__call<_Result>(
[planner_server-19]         483:          std::forward_as_tuple(std::forward<_Args>(__args)...),
[planner_server-19]       > 484:          _Bound_indexes());
[planner_server-19]         485:    }
[planner_server-19]         486: 
[planner_server-19]         487:       // Call as const
[controller_server-18] [WARN] [1645688710.814640196] [controller_server]: Control loop missed its desired rate of 10.0000Hz
[planner_server-19] #10   Source "/usr/include/c++/9/functional", line 400, in __call<void, 0, 1> [0x7f81a16cf2e9]
[planner_server-19]         397:    _Result
[planner_server-19]         398:    __call(tuple<_Args...>&& __args, _Index_tuple<_Indexes...>)
[planner_server-19]         399:    {
[planner_server-19]       > 400:      return std::__invoke(_M_f,
[planner_server-19]         401:          _Mu<_Bound_args>()(std::get<_Indexes>(_M_bound_args), __args)...
[planner_server-19]         402:          );
[planner_server-19]         403:    }
[planner_server-19] #9    Source "/usr/include/c++/9/bits/invoke.h", line 95, in __invoke<void (nav2_costmap_2d::Costmap2DROS::*&)(double), nav2_costmap_2d::Costmap2DROS*&, double&> [0x7f81a16d0396]
[planner_server-19]          92:       using __result = __invoke_result<_Callable, _Args...>;
[planner_server-19]          93:       using __type = typename __result::type;
[planner_server-19]          94:       using __tag = typename __result::__invoke_type;
[planner_server-19]       >  95:       return std::__invoke_impl<__type>(__tag{}, std::forward<_Callable>(__fn),
[planner_server-19]          96:                    std::forward<_Args>(__args)...);
[planner_server-19]          97:     }
[planner_server-19] #8    Source "/usr/include/c++/9/bits/invoke.h", line 73, in __invoke_impl<void, void (nav2_costmap_2d::Costmap2DROS::*&)(double), nav2_costmap_2d::Costmap2DROS*&, double&> [0x7f81a16d0f73]
[planner_server-19]          70:     __invoke_impl(__invoke_memfun_deref, _MemFun&& __f, _Tp&& __t,
[planner_server-19]          71:          _Args&&... __args)
[planner_server-19]          72:     {
[planner_server-19]       >  73:       return ((*std::forward<_Tp>(__t)).*__f)(std::forward<_Args>(__args)...);
[planner_server-19]          74:     }
[planner_server-19]          75: 
[planner_server-19]          76:   template<typename _Res, typename _MemPtr, typename _Tp>
[planner_server-19] #7    Source "/robot_ws/src/navigation2/nav2_costmap_2d/src/costmap_2d_ros.cpp", line 412, in mapUpdateLoop [0x7f81a161ebd1]
[planner_server-19]         410:     // Measure the execution time of the updateMap method
[planner_server-19]         411:     timer.start();
[planner_server-19]       > 412:     updateMap();
[planner_server-19]         413:     timer.end();
[planner_server-19]         414: 
[planner_server-19]         415:     RCLCPP_DEBUG(get_logger(), "Map update time: %.9f", timer.elapsed_time_in_seconds());
[planner_server-19] #6    Source "/robot_ws/src/navigation2/nav2_costmap_2d/src/costmap_2d_ros.cpp", line 458, in updateMap [0x7f81a161f549]
[planner_server-19]         455:       const double & x = pose.pose.position.x;
[planner_server-19]         456:       const double & y = pose.pose.position.y;
[planner_server-19]         457:       const double yaw = tf2::getYaw(pose.pose.orientation);
[planner_server-19]       > 458:       layered_costmap_->updateMap(x, y, yaw);
[planner_server-19]         459: 
[planner_server-19]         460:       auto footprint = std::make_unique<geometry_msgs::msg::PolygonStamped>();
[planner_server-19]         461:       footprint->header.frame_id = global_frame_;
[planner_server-19] #5    Source "/robot_ws/src/navigation2/nav2_costmap_2d/src/layered_costmap.cpp", line 161, in updateMap [0x7f81a16130fb]
[planner_server-19]         158:     double prev_miny = miny_;
[planner_server-19]         159:     double prev_maxx = maxx_;
[planner_server-19]         160:     double prev_maxy = maxy_;
[planner_server-19]       > 161:     (*plugin)->updateBounds(robot_x, robot_y, robot_yaw, &minx_, &miny_, &maxx_, &maxy_);
[planner_server-19]         162:     if (minx_ > prev_minx || miny_ > prev_miny || maxx_ < prev_maxx || maxy_ < prev_maxy) {
[planner_server-19]         163:       RCLCPP_WARN(
[planner_server-19]         164:         rclcpp::get_logger(
[planner_server-19] #4    Source "/robot_ws/src/navigation2/nav2_costmap_2d/plugins/obstacle_layer.cpp", line 390, in updateBounds [0x7f819080d2e2]
[planner_server-19]         388:   // raytrace freespace
[planner_server-19]         389:   for (unsigned int i = 0; i < clearing_observations.size(); ++i) {
[planner_server-19]       > 390:     raytraceFreespace(clearing_observations[i], min_x, min_y, max_x, max_y);
[planner_server-19]         391:   }
[planner_server-19]         392: 
[planner_server-19]         393:   // place the new obstacles into a priority queue... each with a priority of zero to begin with
[planner_server-19] #3    Source "/robot_ws/src/navigation2/nav2_costmap_2d/plugins/obstacle_layer.cpp", line 636, in raytraceFreespace [0x7f819080eb6f]
[planner_server-19]         633:     unsigned int cell_raytrace_min_range = cellDistance(clearing_observation.raytrace_min_range_);
[planner_server-19]         634:     MarkCell marker(costmap_, FREE_SPACE);
[planner_server-19]         635:     // and finally... we can execute our trace to clear obstacles along that line
[planner_server-19]       > 636:     raytraceLine(marker, x0, y0, x1, y1, cell_raytrace_max_range, cell_raytrace_min_range);
[planner_server-19]         637: 
[planner_server-19]         638:     updateRaytraceBounds(
[planner_server-19]         639:       ox, oy, wx, wy, clearing_observation.raytrace_max_range_,
[planner_server-19] #2    Source "/robot_ws/src/navigation2/nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d.hpp", line 466, in raytraceLine<nav2_costmap_2d::Costmap2D::MarkCell> [0x7f819081358f]
[planner_server-19]         463:       // Subtract minlength from length since initial point (x0, y0)has been adjusted by min Z
[planner_server-19]         464:       length = (unsigned int)(scale * abs_dx) - min_length;
[planner_server-19]         465: 
[planner_server-19]       > 466:       bresenham2D(
[planner_server-19]         467:         at, abs_dx, abs_dy, error_y, offset_dx, offset_dy, offset, length);
[planner_server-19]         468:       return;
[planner_server-19]         469:     }
[planner_server-19] #1    Source "/robot_ws/src/navigation2/nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d.hpp", line 502, in bresenham2D<nav2_costmap_2d::Costmap2D::MarkCell> [0x7f8190817514]
[planner_server-19]         499:         error_b -= abs_da;
[planner_server-19]         500:       }
[planner_server-19]         501:     }
[planner_server-19]       > 502:     at(offset);
[planner_server-19]         503:   }
[planner_server-19]         504: 
[planner_server-19]         505: 
[planner_server-19] #0    Source "/robot_ws/src/navigation2/nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d.hpp", line 534, in operator() [0x7f819080fb3e]
[planner_server-19]         531:     }
[planner_server-19]         532:     inline void operator()(unsigned int offset)
[planner_server-19]         533:     {
[planner_server-19]       > 534:       costmap_[offset] = value_;
[planner_server-19]         535:     }
[planner_server-19]         536: 
[planner_server-19]         537:   private:
[planner_server-19] Segmentation fault (Address not mapped to object [0x7f828ca16658])
[ERROR] [planner_server-19]: process has died [pid 1629, exit code -11, cmd '/robot_ws/install/nav2_planner/lib/nav2_planner/planner_server --ros-args --log-level info --ros-args -r __node:=planner_server --params-file /tmp/launch_params_up3n6wv8 --params-file /tmp/tmppcmyy2f9 -r /tf:=tf -r /tf_static:=tf_static -r /tf:=tf -r /tf_static:=tf_static'].

For this issue I created a quick workaround simply by adding a if statement to lines 494 and 502. I did not yet have time to investigate this further and did not have chance to test this on rolling or with turtlebot for example.

unsigned int costmap_size = size_x_ * size_y_;
...
    if (costmap_size > offset) {
      at(offset);
    }

Please let me know if more information, configurations or something is needed.

SteveMacenski commented 2 years ago

@AlexeyMerzlyakov can you take this one? Looks like another related issue that you're most recently familiar with this part of the codebase.

This does seem a little odd to be problematic, to my knowledge, nothing has changed in this part of the code from ROS 1 navigation, and I don't recall even running into this wrt Costmap2D in ROS 1. But for all I know, this could be a long-standing problem.

Can you run some experiment in the default bringup / map in order to show this? That would help if we could reproduce ourselves. You can relocalize the robot into any position on the map

AlexeyMerzlyakov commented 2 years ago

I've tried to reproduce the problem locally, but so far it's all works. @jonipol, could I ask you to share the results with default Nav2 bringup scripts? This indeed would be greatly helpful for us and allow to reproduce and understand the problem.

jonipol commented 2 years ago

I'm sorry I have not had time to try this out yet. I will be returning from a work trip next week and will try to allocate some time to investigate this issue as it is really weird one. We just did migration from foxy to galactic, and this issue came up on the galactic branch but did not happen in Foxy one. Will update here once I manage to try this out.

SteveMacenski commented 2 years ago

@jonipol can you try on main? Perhaps this is a bug that we already fixed in Rolling. @AlexeyMerzlyakov did you test with Galactic at tag 1.0.8 https://github.com/ros-planning/navigation2/releases/tag/1.0.8 to see if that's the case? I know you did some work on this recently that might not have been backport-able.

AlexeyMerzlyakov commented 2 years ago

@AlexeyMerzlyakov did you test with Galactic at tag 1.0.8 https://github.com/ros-planning/navigation2/releases/tag/1.0.8 to see if that's the case?

Today I've re-built nav2 stack for this tag (in a VM with Galactic installed from deb-packages) and tried to reproduce the problem moving robot to the edge of both: global and local costmaps. In case if robot inside costmap boundaries, nothing happens. In case if robot is slightly out of costmap, appropriate server (controller or planner) reports this, but still no crashes or segfaults. I've also tried to move robot to the edges of global costmap having negative X or Y coordinates. Still no effect: Screenshot_2022-03-18_14-51-25

@jonipol, it is not finally clear what does "Move robot near the edge of costmap" mean? Does it mean robot is moving on the costmap using "2D pose esitmate" RViz command or is it physical move: immediate changing of robot coordinates in a Gazebo?

jonipol commented 2 years ago

Sorry for unclear explanation. In this case both work, giving pose estimate and navigating the robot near edge of global costmap.

Today I managed to do some digging into this one. Tried the default config for global costmap. Corrected topic and was not able to get the crash. Did some comparisons and more testing. Found out that setting raytrace_min_range > 0.0 (0.2 in my case) makes the crash happen for me.

AlexeyMerzlyakov commented 2 years ago

@jonipol, thank you for pointing out raytrace_min_range! Confirmed the problem for TB3 standard simulation on both: Galactic and Rolling releases. I'll continue investigation/debugging on next week and give more details as they will be appeared.

AlexeyMerzlyakov commented 2 years ago

There are two versions of raytraceLine()/bresenham() algorithms. One is placed in nav2_voxel_grid/include/nav2_voxel_grid/voxel_grid.hpp and used for 3D space voxel grids. Another - in nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d.hpp which is used for 2D costmaps. The root cause of this problem is in 2D space and it is the same as was fixed in #2460 but for 3D version of Bresenham algorithm: incorrect component shift in length calculation inside raytraceLine() routine, which leads to negative shifts of costmap_ array for some cases.

So, all related bugfixes were (partially) back-ported from 3D raytraceLine() to 2D version of algorithms in #2857. Local verification shown the problem disappeared on my side. @jonipol, could you please check: does the fix proposed work on your side?

jonipol commented 2 years ago

Yes, the proposed fix does solve the issue on hand for me!