ros-navigation / navigation2

ROS 2 Navigation Framework and System
https://nav2.org/
Other
2.3k stars 1.2k forks source link

Completely shutdown dynamic_param callback threads instead of only `dyn_param_handler_.reset()` #4496

Open GoesM opened 6 days ago

GoesM commented 6 days ago

Bug report

Required Info:

Steps to reproduce issue

  1. Launch the navigation2 as following steps:
    #!/bin/bash
    export ASAN_OPTIONS=halt_on_error=0:new_delete_type_mismatch=0:detect_leaks=0:log_pah=asan
    source install/setup.bash
    export TURTLEBOT3_MODEL=waffle
    export GAZEBO_MODEL_PATH=$GAZEBO_MODEL_PATH:/opt/ros/humble/share/turtlebot3_gazebo/models
    ros2 launch nav2_bringup tb3_simulation_launch.py headless:=True use_rviz:=False use_composition:=False 
  2. sending request for change any dynamic_parameter of nav2_amcl, like following one:
    ros2 param set amcl scan_topic scan
  3. deacitvate and cleanup nav2_amcl, like by press Ctrl+C

Expected behavior

no UAF bug

Actual behavior

UAF bug occured during on_cleanup() of nav2_amcl, the asan report is as following;

=================================================================
==354519==ERROR: AddressSanitizer: heap-use-after-free on address 0x617000096c30 at pc 0x7f60ad65b9f7 bp 0x7ffee9e82c80 sp 0x7ffee9e82c78
READ of size 8 at 0x617000096c30 thread T0
    #0 0x7f60ad4b1c00 in nav2_amcl::AmclNode::dynamicParametersCallback(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >) (/home/*****/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x174c00) (BuildId: 06f441f7bb569f4916eea766aa565e2a8d66c058)
    #1 0x7f60ad5628c6 in rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > std::__invoke_impl<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> >, rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::*&)(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >), nav2_amcl::AmclNode*&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&>(std::__invoke_memfun_deref, rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::*&)(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >), nav2_amcl::AmclNode*&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&) (/home/*****/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x2258c6) (BuildId: 06f441f7bb569f4916eea766aa565e2a8d66c058)
    #2 0x7f60ad562542 in std::_Function_handler<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&), std::_Bind<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::* (nav2_amcl::AmclNode*, std::_Placeholder<1>))(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >)> >::_M_invoke(std::_Any_data const&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&) (/home/*****/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x225542) (BuildId: 06f441f7bb569f4916eea766aa565e2a8d66c058)
    #3 0x7f60ae062401  (/opt/ros/humble/lib/librclcpp.so+0x10d401) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #4 0x7f60ae066f31 in rclcpp::node_interfaces::NodeParameters::set_parameters_atomically(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&) (/opt/ros/humble/lib/librclcpp.so+0x111f31) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #5 0x7f60ae0e016c  (/opt/ros/humble/lib/librclcpp.so+0x18b16c) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #6 0x7f60ae09d757  (/opt/ros/humble/lib/librclcpp.so+0x148757) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #7 0x7f60ae0abcfe  (/opt/ros/humble/lib/librclcpp.so+0x156cfe) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #8 0x7f60ae03e375  (/opt/ros/humble/lib/librclcpp.so+0xe9375) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #9 0x7f60ae03bd59 in rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>) (/opt/ros/humble/lib/librclcpp.so+0xe6d59) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #10 0x7f60ae03c0c5 in rclcpp::Executor::execute_any_executable(rclcpp::AnyExecutable&) (/opt/ros/humble/lib/librclcpp.so+0xe70c5) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #11 0x7f60ae04397f in rclcpp::executors::SingleThreadedExecutor::spin() (/opt/ros/humble/lib/librclcpp.so+0xee97f) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #12 0x7f60ae043b94 in rclcpp::spin(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>) (/opt/ros/humble/lib/librclcpp.so+0xeeb94) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #13 0x55de5cf30a67 in main (/home/*****/nav2_humble/install/nav2_amcl/lib/nav2_amcl/amcl+0xe1a67) (BuildId: af681ae626cb5c1aeecd03031dbccd5dfa334f52)
    #14 0x7f60ac9abd8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: 490fef8403240c91833978d494d39e537409b92e)
    #15 0x7f60ac9abe3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: 490fef8403240c91833978d494d39e537409b92e)
    #16 0x55de5ce70424 in _start (/home/*****/nav2_humble/install/nav2_amcl/lib/nav2_amcl/amcl+0x21424) (BuildId: af681ae626cb5c1aeecd03031dbccd5dfa334f52)

0x617000096c30 is located 48 bytes inside of 704-byte region [0x617000096c00,0x617000096ec0)
freed by thread T1 here:
    #0 0x55de5cf2e89d in operator delete(void*) (/home/*****/nav2_humble/install/nav2_amcl/lib/nav2_amcl/amcl+0xdf89d) (BuildId: af681ae626cb5c1aeecd03031dbccd5dfa334f52)
    #1 0x7f60ad4b5c9c in nav2_amcl::AmclNode::on_cleanup(rclcpp_lifecycle::State const&) (/home/*****/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x178c9c) (BuildId: 06f441f7bb569f4916eea766aa565e2a8d66c058)
    #2 0x7f60adf388ec  (/opt/ros/humble/lib/librclcpp_lifecycle.so+0x288ec) (BuildId: 97f6428dc1ee45fd402b522b3b8e6b4fcfeabe76)

previously allocated by thread T0 here:
    #0 0x55de5cf2e03d in operator new(unsigned long) (/home/*****/nav2_humble/install/nav2_amcl/lib/nav2_amcl/amcl+0xdf03d) (BuildId: af681ae626cb5c1aeecd03031dbccd5dfa334f52)
    #1 0x7f60ad49e792 in nav2_amcl::AmclNode::initMessageFilters() (/home/*****/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x161792) (BuildId: 06f441f7bb569f4916eea766aa565e2a8d66c058)
    #2 0x7f60ad492985 in nav2_amcl::AmclNode::on_configure(rclcpp_lifecycle::State const&) (/home/*****/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x155985) (BuildId: 06f441f7bb569f4916eea766aa565e2a8d66c058)
    #3 0x7f60adf388ec  (/opt/ros/humble/lib/librclcpp_lifecycle.so+0x288ec) (BuildId: 97f6428dc1ee45fd402b522b3b8e6b4fcfeabe76)

Thread T1 created by T0 here:
    #0 0x55de5cedc6ec in __interceptor_pthread_create (/home/*****/nav2_humble/install/nav2_amcl/lib/nav2_amcl/amcl+0x8d6ec) (BuildId: af681ae626cb5c1aeecd03031dbccd5dfa334f52)
    #1 0x7f60acca7328 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc328) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2)
    #2 0x7f60ae18607f  (/opt/ros/humble/lib/librcutils.so+0xa07f) (BuildId: cf5ab83f9f550b1f79433195029ab96f67a27c78)

SUMMARY: AddressSanitizer: heap-use-after-free (/home/*****/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x31e9f6) (BuildId: 06f441f7bb569f4916eea766aa565e2a8d66c058) in message_filters::Signal1<sensor_msgs::msg::LaserScan_<std::allocator<void> > >::removeCallback(std::shared_ptr<message_filters::CallbackHelper1<sensor_msgs::msg::LaserScan_<std::allocator<void> > > > const&)
Shadow bytes around the buggy address:
  0x0c2e8000ad30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000ad40: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000ad50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000ad60: fd fd fd fd fd fa fa fa fa fa fa fa fa fa fa fa
  0x0c2e8000ad70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c2e8000ad80: fd fd fd fd fd fd[fd]fd fd fd fd fd fd fd fd fd
  0x0c2e8000ad90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000ada0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000adb0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000adc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000add0: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==354519==ABORTING

Additional information


The cause of the UAF bug:

The callback thread dynamicParametersCallback() of dynamic_handler_ is still running though nav2_amcl has been executing on_cleanup(). And on_cleanup() would release many resources of nav2_amcl but the callback thread would access those released resources, so that the UAF bug occurs.

GoesM commented 6 days ago

However, I noticed that during deactivate(), dyn_handler_.reset() is executed to try to stop the dynamic parameter callback thread: https://github.com/ros-navigation/navigation2/blob/715df8f3477f6ed1c023d8f928e321bb18c403e5/nav2_amcl/src/amcl_node.cpp#L304-L316

But it seems that it doesn't actually achieve this, so I conducted the following experiment to verify:

Experiment:

Proving that the callback function continues to execute even after dyn_handler_.reset() and executor_thread_.reset():

Insert delay and log:

/**
  * @brief Callback executed when a parameter change is detected
  * @param event ParameterEvent message
  */
rcl_interfaces::msg::SetParametersResult
AmclNode::dynamicParametersCallback(
  std::vector<rclcpp::Parameter> parameters)
{
  std::lock_guard<std::recursive_mutex> cfl(mutex_);
  rcl_interfaces::msg::SetParametersResult result;
// insert
RCLCPP_INFO(get_logger(), "-----------------------dynamicParamCallback Called!");
std::this_thread::sleep_for(std::chrono::seconds(5));
RCLCPP_INFO(get_logger(), "-----------------------dynamicParamCallback still executing!");
//
...
...
nav2_util::CallbackReturn
AmclNode::on_cleanup(const rclcpp_lifecycle::State & /*state*/)
{
  RCLCPP_INFO(get_logger(), "Cleaning up");

  executor_thread_.reset();
// insert 
RCLCPP_INFO(get_logger(), "-------------------After executor_thread_.reset()");
//

Then, run navigation2 and publish a dynamic parameter change request, such as ros2 param set amcl min_particles 500. Immediately after sending the request, press Ctrl+C to let navigation2 sequentially execute the on_deactivate() and on_cleanup() phases. We will see the following logs:

...
...
[amcl-5] [INFO] [1719331177.062007626] [amcl]: -----------------------dynamicParamCallback Called!
...
...
[amcl-5] [INFO] [1719331179.520602552] [rclcpp]: signal_handler(signum=2)
[amcl-5] [INFO] [1719331179.520758931] [amcl]: Running Nav2 LifecycleNode rcl preshutdown (amcl)
[amcl-5] [INFO] [1719331179.520976194] [amcl]: Deactivating
[amcl-5] [INFO] [1719331179.521025282] [amcl]: Destroying bond (amcl) to lifecycle manager.
...
...
[amcl-5] [INFO] [1719331179.623240551] [amcl]: Cleaning up
[amcl-5] [INFO] [1719331179.623584548] [amcl]: -------------------After executor_thread_.reset()
[amcl-5] [INFO] [1719331179.636316062] [amcl]: Destroying bond (amcl) to lifecycle manager.
...
...
[amcl-5] [INFO] [1719331182.062343761] [amcl]: -----------------------dynamicParamCallback still executing!
[amcl-5] [INFO] [1719331182.074333144] [amcl]: Destroying
[amcl-5] [ERROR] [1719331182.074424004] [amcl]: Unable to start transition 5 from current state shuttingdown: Could not publish transition: publisher's context is invalid, at ./src/rcl/publisher.c:389, at ./src/rcl_lifecycle.c:368
[amcl-5] [WARN] [1719331182.074453949] [rclcpp_lifecycle]: Shutdown error in destruction of LifecycleNode: final state(shuttingdown

It proves that the callback thread would be working even after dyn_handler_.reset() and executor_thread_.reset().

suggestion

SO perhaps we need to update the way to completely shutdown dyn_handler_ ?

SteveMacenski commented 6 days ago

Perhaps we need to https://github.com/ros2/rclcpp/blob/8230d15ef7ffa9e824d7e0f9c623537ae331dc73/rclcpp/src/rclcpp/node.cpp#L475-L479 remove_on_set_parameters_callback rather than just resetting the std::function. That would apply to your other ticket (and all other Nav2 servers). If so, then you could close that other ticket and rename this one about that general problem. It should be pretty fast to batch out, there's about ~30 uses of the dynamic parameter callback, but it should all just be copy-paste adding 1 line before the reset:

node->remove_on_set_parameters_callback( dyn_handler_);
 dyn_handler_.reset();
GoesM commented 5 days ago

Sure! My future work will follow these steps:

Perhaps I'm not good at renaming issues, so feel free to modify it if it seems inappropriate. 😜

SteveMacenski commented 5 days ago

That's perfect!