Open GoesM opened 6 days ago
However, I noticed that during deactivate()
, dyn_handler_.reset()
is executed to try to stop the dynamic parameter callback thread:
https://github.com/ros-navigation/navigation2/blob/715df8f3477f6ed1c023d8f928e321bb18c403e5/nav2_amcl/src/amcl_node.cpp#L304-L316
But it seems that it doesn't actually achieve this, so I conducted the following experiment to verify:
dyn_handler_.reset()
and executor_thread_.reset()
:Insert delay and log:
/**
* @brief Callback executed when a parameter change is detected
* @param event ParameterEvent message
*/
rcl_interfaces::msg::SetParametersResult
AmclNode::dynamicParametersCallback(
std::vector<rclcpp::Parameter> parameters)
{
std::lock_guard<std::recursive_mutex> cfl(mutex_);
rcl_interfaces::msg::SetParametersResult result;
// insert
RCLCPP_INFO(get_logger(), "-----------------------dynamicParamCallback Called!");
std::this_thread::sleep_for(std::chrono::seconds(5));
RCLCPP_INFO(get_logger(), "-----------------------dynamicParamCallback still executing!");
//
...
...
nav2_util::CallbackReturn
AmclNode::on_cleanup(const rclcpp_lifecycle::State & /*state*/)
{
RCLCPP_INFO(get_logger(), "Cleaning up");
executor_thread_.reset();
// insert
RCLCPP_INFO(get_logger(), "-------------------After executor_thread_.reset()");
//
Then, run navigation2 and publish a dynamic parameter change request, such as ros2 param set amcl min_particles 500
. Immediately after sending the request, press Ctrl+C to let navigation2 sequentially execute the on_deactivate()
and on_cleanup()
phases. We will see the following logs:
...
...
[amcl-5] [INFO] [1719331177.062007626] [amcl]: -----------------------dynamicParamCallback Called!
...
...
[amcl-5] [INFO] [1719331179.520602552] [rclcpp]: signal_handler(signum=2)
[amcl-5] [INFO] [1719331179.520758931] [amcl]: Running Nav2 LifecycleNode rcl preshutdown (amcl)
[amcl-5] [INFO] [1719331179.520976194] [amcl]: Deactivating
[amcl-5] [INFO] [1719331179.521025282] [amcl]: Destroying bond (amcl) to lifecycle manager.
...
...
[amcl-5] [INFO] [1719331179.623240551] [amcl]: Cleaning up
[amcl-5] [INFO] [1719331179.623584548] [amcl]: -------------------After executor_thread_.reset()
[amcl-5] [INFO] [1719331179.636316062] [amcl]: Destroying bond (amcl) to lifecycle manager.
...
...
[amcl-5] [INFO] [1719331182.062343761] [amcl]: -----------------------dynamicParamCallback still executing!
[amcl-5] [INFO] [1719331182.074333144] [amcl]: Destroying
[amcl-5] [ERROR] [1719331182.074424004] [amcl]: Unable to start transition 5 from current state shuttingdown: Could not publish transition: publisher's context is invalid, at ./src/rcl/publisher.c:389, at ./src/rcl_lifecycle.c:368
[amcl-5] [WARN] [1719331182.074453949] [rclcpp_lifecycle]: Shutdown error in destruction of LifecycleNode: final state(shuttingdown
It proves that the callback thread would be working even after dyn_handler_.reset()
and executor_thread_.reset()
.
SO perhaps we need to update the way to completely shutdown dyn_handler_
?
Perhaps we need to https://github.com/ros2/rclcpp/blob/8230d15ef7ffa9e824d7e0f9c623537ae331dc73/rclcpp/src/rclcpp/node.cpp#L475-L479 remove_on_set_parameters_callback
rather than just resetting the std::function
. That would apply to your other ticket (and all other Nav2 servers). If so, then you could close that other ticket and rename this one about that general problem. It should be pretty fast to batch out, there's about ~30 uses of the dynamic parameter callback, but it should all just be copy-paste adding 1 line before the reset:
node->remove_on_set_parameters_callback( dyn_handler_);
dyn_handler_.reset();
Sure! My future work will follow these steps:
node->remove_on_set_parameters_callback(dyn_handler_);
could shutdown the callback threads properly.Perhaps I'm not good at renaming issues, so feel free to modify it if it seems inappropriate. 😜
That's perfect!
Bug report
Required Info:
Steps to reproduce issue
nav2_amcl
, like following one:Ctrl+C
Expected behavior
no UAF bug
Actual behavior
UAF bug occured during
on_cleanup()
of nav2_amcl, the asan report is as following;Additional information
The cause of the UAF bug:
The callback thread
dynamicParametersCallback()
ofdynamic_handler_
is still running though nav2_amcl has been executingon_cleanup()
. Andon_cleanup()
would release many resources of nav2_amcl but the callback thread would access those released resources, so that the UAF bug occurs.