ros-navigation / navigation2

ROS 2 Navigation Framework and System
https://nav2.org/
Other
2.31k stars 1.2k forks source link

18 bugs (UAF) in `nav2_amcl` by setting dynamic parameters #4379

Closed GoesM closed 1 month ago

GoesM commented 1 month ago

Bug report

Required Info:

Steps to reproduce issue

Launch the navigation2 normally, as following steps:

#!/bin/bash
export ASAN_OPTIONS=halt_on_error=0:new_delete_type_mismatch=0:detect_leaks=0:log_pah=asan
source install/setup.bash
export TURTLEBOT3_MODEL=waffle
export GAZEBO_MODEL_PATH=$GAZEBO_MODEL_PATH:/opt/ros/humble/share/turtlebot3_gazebo/models
ros2 launch nav2_bringup tb3_simulation_launch.py headless:=True use_rviz:=False use_composition:=False 

Learning about how the dynamic-parameter works , I had a try on it .

ros2 param set /amcl z_rand 0.5

[notice] whatever the value be (0.5 or anyother double value43) , the UAF occurs all the time.

Expected behavior

no crash occurs

Actual behavior

The ASAN reporting a heap-user-after-free bug to me as following, and the nav2_amcl stop its work.

=================================================================
==145868==ERROR: AddressSanitizer: heap-use-after-free on address 0x6170000aaab0 at pc 0x730fff32b5a4 bp 0x7fff30eba470 sp 0x7fff30eba468
READ of size 8 at 0x6170000aaab0 thread T0
    #0 0x730fff32b5a3 in message_filters::Signal1<sensor_msgs::msg::LaserScan_<std::allocator<void> > >::removeCallback(std::shared_ptr<message_filters::CallbackHelper1<sensor_msgs::msg::LaserScan_<std::allocator<void> > > > const&) (/home/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x52b5a3) (BuildId: e96a3822dfb41386b5409b544e4bc74088297548)
    #1 0x730fff32e80a in std::_Function_handler<void (), std::_Bind<void (message_filters::Signal1<sensor_msgs::msg::LaserScan_<std::allocator<void> > >::* (message_filters::Signal1<sensor_msgs::msg::LaserScan_<std::allocator<void> > >*, std::shared_ptr<message_filters::CallbackHelper1<sensor_msgs::msg::LaserScan_<std::allocator<void> > > >))(std::shared_ptr<message_filters::CallbackHelper1<sensor_msgs::msg::LaserScan_<std::allocator<void> > > > const&)> >::_M_invoke(std::_Any_data const&) (/home/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x52e80a) (BuildId: e96a3822dfb41386b5409b544e4bc74088297548)
    #2 0x730fff3273c0 in tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer>::~MessageFilter() (/home/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x5273c0) (BuildId: e96a3822dfb41386b5409b544e4bc74088297548)
    #3 0x730fff3281c0 in tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer>::~MessageFilter() (/home/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x5281c0) (BuildId: e96a3822dfb41386b5409b544e4bc74088297548)
    #4 0x730fff1c36e4 in std::__uniq_ptr_impl<tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer>, std::default_delete<tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer> > >::reset(tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer>*) (/home/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x3c36e4) (BuildId: e96a3822dfb41386b5409b544e4bc74088297548)
    #5 0x730fff156ee9 in nav2_amcl::AmclNode::initMessageFilters() (/home/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x356ee9) (BuildId: e96a3822dfb41386b5409b544e4bc74088297548)
    #6 0x730fff163f1e in nav2_amcl::AmclNode::dynamicParametersCallback(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >) (/home/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x363f1e) (BuildId: e96a3822dfb41386b5409b544e4bc74088297548)
    #7 0x730fff23743e in std::__invoke_result<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::*&)(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >), nav2_amcl::AmclNode*&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&>::type std::__invoke<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::*&)(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >), nav2_amcl::AmclNode*&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&>(rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::*&)(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >), nav2_amcl::AmclNode*&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&) (/home/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x43743e) (BuildId: e96a3822dfb41386b5409b544e4bc74088297548)
    #8 0x730fff2372a1 in rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > std::__invoke_impl<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> >, std::_Bind<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::* (nav2_amcl::AmclNode*, std::_Placeholder<1>))(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >)>&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&>(std::__invoke_other, std::_Bind<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::* (nav2_amcl::AmclNode*, std::_Placeholder<1>))(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >)>&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&) (/home/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x4372a1) (BuildId: e96a3822dfb41386b5409b544e4bc74088297548)
    #9 0x730fff2370d3 in std::_Function_handler<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&), std::_Bind<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::* (nav2_amcl::AmclNode*, std::_Placeholder<1>))(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >)> >::_M_invoke(std::_Any_data const&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&) (/home/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x4370d3) (BuildId: e96a3822dfb41386b5409b544e4bc74088297548)
    #10 0x7310002601a1  (/opt/ros/humble/lib/librclcpp.so+0x10e1a1) (BuildId: 4cca8a387f3c93d38a0567a8efc7cba9106f5d9a)
    #11 0x731000264cd1 in rclcpp::node_interfaces::NodeParameters::set_parameters_atomically(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&) (/opt/ros/humble/lib/librclcpp.so+0x112cd1) (BuildId: 4cca8a387f3c93d38a0567a8efc7cba9106f5d9a)
    #12 0x7310002ddafc  (/opt/ros/humble/lib/librclcpp.so+0x18bafc) (BuildId: 4cca8a387f3c93d38a0567a8efc7cba9106f5d9a)
    #13 0x73100029b107  (/opt/ros/humble/lib/librclcpp.so+0x149107) (BuildId: 4cca8a387f3c93d38a0567a8efc7cba9106f5d9a)
    #14 0x7310002a96ae  (/opt/ros/humble/lib/librclcpp.so+0x1576ae) (BuildId: 4cca8a387f3c93d38a0567a8efc7cba9106f5d9a)
    #15 0x73100023c2a5  (/opt/ros/humble/lib/librclcpp.so+0xea2a5) (BuildId: 4cca8a387f3c93d38a0567a8efc7cba9106f5d9a)
    #16 0x731000239c89 in rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>) (/opt/ros/humble/lib/librclcpp.so+0xe7c89) (BuildId: 4cca8a387f3c93d38a0567a8efc7cba9106f5d9a)
    #17 0x731000239ff5 in rclcpp::Executor::execute_any_executable(rclcpp::AnyExecutable&) (/opt/ros/humble/lib/librclcpp.so+0xe7ff5) (BuildId: 4cca8a387f3c93d38a0567a8efc7cba9106f5d9a)
    #18 0x7310002418af in rclcpp::executors::SingleThreadedExecutor::spin() (/opt/ros/humble/lib/librclcpp.so+0xef8af) (BuildId: 4cca8a387f3c93d38a0567a8efc7cba9106f5d9a)
    #19 0x731000241ac4 in rclcpp::spin(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>) (/opt/ros/humble/lib/librclcpp.so+0xefac4) (BuildId: 4cca8a387f3c93d38a0567a8efc7cba9106f5d9a)
    #20 0x5e3a5905f941 in main (/home/***/nav2_humble/install/nav2_amcl/lib/nav2_amcl/amcl+0xe6941) (BuildId: 068ea3ea0211b05161e789e94dd1668c91ef8430)
    #21 0x730ffe429d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #22 0x730ffe429e3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #23 0x5e3a58f9f504 in _start (/home/***/nav2_humble/install/nav2_amcl/lib/nav2_amcl/amcl+0x26504) (BuildId: 068ea3ea0211b05161e789e94dd1668c91ef8430)

0x6170000aaab0 is located 48 bytes inside of 680-byte region [0x6170000aaa80,0x6170000aad28)
freed by thread T0 here:
    #0 0x5e3a5905d97d in operator delete(void*) (/home/***/nav2_humble/install/nav2_amcl/lib/nav2_amcl/amcl+0xe497d) (BuildId: 068ea3ea0211b05161e789e94dd1668c91ef8430)
    #1 0x730fff1c3594 in std::__uniq_ptr_impl<message_filters::Subscriber<sensor_msgs::msg::LaserScan_<std::allocator<void> >, rclcpp_lifecycle::LifecycleNode>, std::default_delete<message_filters::Subscriber<sensor_msgs::msg::LaserScan_<std::allocator<void> >, rclcpp_lifecycle::LifecycleNode> > >::reset(message_filters::Subscriber<sensor_msgs::msg::LaserScan_<std::allocator<void> >, rclcpp_lifecycle::LifecycleNode>*) (/home/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x3c3594) (BuildId: e96a3822dfb41386b5409b544e4bc74088297548)
    #2 0x730fff23743e in std::__invoke_result<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::*&)(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >), nav2_amcl::AmclNode*&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&>::type std::__invoke<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::*&)(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >), nav2_amcl::AmclNode*&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&>(rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::*&)(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >), nav2_amcl::AmclNode*&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&) (/home/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x43743e) (BuildId: e96a3822dfb41386b5409b544e4bc74088297548)
    #3 0x730fff2372a1 in rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > std::__invoke_impl<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> >, std::_Bind<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::* (nav2_amcl::AmclNode*, std::_Placeholder<1>))(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >)>&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&>(std::__invoke_other, std::_Bind<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::* (nav2_amcl::AmclNode*, std::_Placeholder<1>))(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >)>&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&) (/home/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x4372a1) (BuildId: e96a3822dfb41386b5409b544e4bc74088297548)

previously allocated by thread T0 here:
    #0 0x5e3a5905d11d in operator new(unsigned long) (/home/***/nav2_humble/install/nav2_amcl/lib/nav2_amcl/amcl+0xe411d) (BuildId: 068ea3ea0211b05161e789e94dd1668c91ef8430)
    #1 0x730fff1bb3cf in std::__detail::_MakeUniq<message_filters::Subscriber<sensor_msgs::msg::LaserScan_<std::allocator<void> >, rclcpp_lifecycle::LifecycleNode> >::__single_object std::make_unique<message_filters::Subscriber<sensor_msgs::msg::LaserScan_<std::allocator<void> >, rclcpp_lifecycle::LifecycleNode>, std::shared_ptr<nav2_util::LifecycleNode>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, rmw_qos_profile_s const&, rclcpp::SubscriptionOptionsWithAllocator<std::allocator<void> >&>(std::shared_ptr<nav2_util::LifecycleNode>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, rmw_qos_profile_s const&, rclcpp::SubscriptionOptionsWithAllocator<std::allocator<void> >&) (/home/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x3bb3cf) (BuildId: e96a3822dfb41386b5409b544e4bc74088297548)
    #2 0x730fff156acd in nav2_amcl::AmclNode::initMessageFilters() (/home/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x356acd) (BuildId: e96a3822dfb41386b5409b544e4bc74088297548)
    #3 0x730fff14c01e in nav2_amcl::AmclNode::on_configure(rclcpp_lifecycle::State const&) (/home/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x34c01e) (BuildId: e96a3822dfb41386b5409b544e4bc74088297548)
    #4 0x731000135b8c  (/opt/ros/humble/lib/librclcpp_lifecycle.so+0x28b8c) (BuildId: e9b8e454bf87aaab775667b79aefcab12c018de9)

SUMMARY: AddressSanitizer: heap-use-after-free (/home/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x52b5a3) (BuildId: e96a3822dfb41386b5409b544e4bc74088297548) in message_filters::Signal1<sensor_msgs::msg::LaserScan_<std::allocator<void> > >::removeCallback(std::shared_ptr<message_filters::CallbackHelper1<sensor_msgs::msg::LaserScan_<std::allocator<void> > > > const&)
Shadow bytes around the buggy address:
  0x0c2e8000d500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2e8000d510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2e8000d520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2e8000d530: 00 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa
  0x0c2e8000d540: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c2e8000d550: fd fd fd fd fd fd[fd]fd fd fd fd fd fd fd fd fd
  0x0c2e8000d560: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000d570: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000d580: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000d590: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000d5a0: fd fd fd fd fd fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==145868==ABORTING

Additional information


it seems that the callback function for laser_scan_message is stiil executing while dynamic parameter changes, and they have some data race.

SteveMacenski commented 1 month ago

That appears to be the case! I think adding a mutex lock should square that up. The laser scan sub is in a different callback group on another executor https://github.com/ros-navigation/navigation2/blob/main/nav2_amcl/src/amcl_node.cpp#L1510-L1513 so they're not synchronized

GoesM commented 1 month ago

well, I think there's a better way to fix it but not adding a mutex lock, because of following facts:

Let's notice that:

https://github.com/ros-navigation/navigation2/blob/30e2cde4997eec8664e8bd8b77caf717c61b1945/nav2_amcl/src/amcl_node.cpp#L1281-L1283

some dynamic parameter setting could let reinit_laser = true, and then do following code lines:

https://github.com/ros-navigation/navigation2/blob/cf3dd553c2164c2a586f0f66628f22d43264d35f/nav2_amcl/src/amcl_node.cpp#L1355-L1364

However, we could find here's a lack of laser_scan_filter_ .reset() before initMessageFilters().

As we know, the callback functions called by lase_scan_filter_ would access the memory , which would be freed during initMessageFilters() , so that a use-after-free bug occurs.

GoesM commented 1 month ago

In summary

similar to z_rand , I have found many other dynamic parameters could lead to such UAF bug, I'd share them here as tickets for this ISSUE and I changed the title of this ISSUE:

  1. ros2 param set /amcl beam_skip_distance 0.5
  2. ros2 param set /amcl beam_skip_error_threshold 0.9
  3. ros2 param set /amcl beam_skip_threshold 0.3
  4. ros2 param set /amcl lambda_short 0.1
  5. ros2 param set /amcl laser_likelihood_max_dist 2.0
  6. ros2 param set /amcl laser_max_range 100.0
  7. ros2 param set /amcl laser_min_range -1.0
  8. ros2 param set /amcl sigma_hit 0.2
  9. ros2 param set /amcl transform_tolerance 1.0
  10. ros2 param set /amcl z_hit 0.5
  11. ros2 param set /amcl z_max 0.05
  12. ros2 param set /amcl z_rand 0.5
  13. ros2 param set /amcl z_short 0.05
  14. ros2 param set /amcl laser_model_type likelihood_field
  15. ros2 param set /amcl odom_frame_id odom
  16. ros2 param set /amcl scan_topic scan
  17. ros2 param set /amcl do_beamskip False
  18. ros2 param set /amcl max_beams 60
SteveMacenski commented 1 month ago

So you're suggesting that we reset the message filter at the start of the dynamic parameter callback and then re-initialize it at the end to kill the extra thread? I think that would work in practice, but I think the mutex is a bit cleaner and in line with what we've done elsewhere in the stack for consistency.

Also constantly creating / destroying the data reader could uncover some DDS vendor implementation issues since that's really exercising the stability of their solution. I don't know that we'd find any bugs, but that would certainly be a good way to trigger them if there was some memory leaks. So in the back of my mind I'd rather not test it if we have an alternative that seems reasonable. The dynamic parameter callbacks are quick so I think this should be fine to mutex lock them :-)

GoesM commented 1 month ago

So you're suggesting that we reset the message filter at the start of the dynamic parameter callback and then re-initialize it at the end to kill the extra thread?

Because I'm considering that the current design approach of nav2_amcl seems to work indeed in this way.

After amcl receiving the instruction to change dynamic parameters, the related threads are closed :

https://github.com/ros-navigation/navigation2/blob/cf3dd553c2164c2a586f0f66628f22d43264d35f/nav2_amcl/src/amcl_node.cpp#L1355-L1364

and then these threads would be restarted within the new configuration by initMessageFilters();

https://github.com/ros-navigation/navigation2/blob/cf3dd553c2164c2a586f0f66628f22d43264d35f/nav2_amcl/src/amcl_node.cpp#L1497-L1517


Additionally, I've had a try on it , by adding the code laser_scan_filter_.reset(); as following:

  // Re-initialize the lasers and it's filters
  if (reinit_laser) {
    lasers_.clear();
    lasers_update_.clear();
    frame_to_laser_.clear();
    laser_scan_connection_.disconnect();
    laser_scan_filter_.reset();
    laser_scan_sub_.reset();

    initMessageFilters();
  }

And then these 18 commands for changing dynamic parameters would work correctly.


So in the back of my mind I'd rather not test it if we have an alternative that seems reasonable. The dynamic parameter callbacks are quick so I think this should be fine to mutex lock them :-)

Sorry, I might not fully understand this method. Do you mean changing the current design, removing all the code that closes the threads, and instead using a locking method?

I feel that I need to understand your suggestion more clearly, and then proceed with my future work based on your recommendation. ^_^

SteveMacenski commented 1 month ago

On further thought, what you say makes sense, lets do that!