ros-navigation / navigation2

ROS 2 Navigation Framework and System
https://nav2.org/
Other
2.3k stars 1.2k forks source link

[Transfered to geometry2] potential bugs perhaps casued by `transformReadyCallback` of `tf2_ros::MessageFilter` #4414

Closed GoesM closed 1 week ago

GoesM commented 3 weeks ago

Bug report

Required Info:

Steps to reproduce issue

Launch the navigation2 normally, as following steps:

#!/bin/bash
export ASAN_OPTIONS=halt_on_error=0:new_delete_type_mismatch=0:detect_leaks=0:log_pah=asan
source install/setup.bash
export TURTLEBOT3_MODEL=waffle
export GAZEBO_MODEL_PATH=$GAZEBO_MODEL_PATH:/opt/ros/humble/share/turtlebot3_gazebo/models
ros2 launch nav2_bringup tb3_simulation_launch.py headless:=True use_rviz:=False use_composition:=False 

Keep sending following command , until amcl crashed.

ros2 param set /amcl beam_skip_threshold 3.14

Expected behavior

no crash anymore

Actual behavior

sometimes crash, sometimes no crash.

Additional information


This may be a related issue of #4379 and #4397 , still about the UAF-bug occurring after request for dynamic parameter, BUT there's something different.

Before the work of #4397 , if we send a ros2 param set /amcl beam_skip_threshold 3.14 command , the amcl node will immediately crash. So #4397 found ONE OF reasons and fixed it : laser_scan_filter_.reset() is ignored.

After the work of #4397, we could find that ros2 param set /amcl beam_skip_threshold 3.14 would not immediately let amcl crashed, and dynamic-parameter would work correctly MOST OF times.

BUT, just MOST OF, there're still some special times in which amcl crashed. And following is the Asan-report in these times.

=================================================================
==76105==ERROR: AddressSanitizer: heap-use-after-free on address 0x6170000a5c90 at pc 0x79ff0b559bba bp 0x79ff01bf6ab0 sp 0x79ff01bf6aa8
READ of size 8 at 0x6170000a5c90 thread T12
    #0 0x79ff0b559bb9 in tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer>::transformReadyCallback(tf2_ros::TransformStampedFuture const&, unsigned long) (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/libamcl_core.so+0x559bb9) (BuildId: f870739ccc7779266347919193e71b4cf78860c9)
    #1 0x79ff0b55db4e in std::_Function_handler<void (tf2_ros::TransformStampedFuture const&), std::_Bind<void (tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer>::* (tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer>*, std::_Placeholder<1>, unsigned long))(tf2_ros::TransformStampedFuture const&, unsigned long)> >::_M_invoke(std::_Any_data const&, tf2_ros::TransformStampedFuture const&) (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/libamcl_core.so+0x55db4e) (BuildId: f870739ccc7779266347919193e71b4cf78860c9)
    #2 0x79ff0c85130b in std::_Function_handler<void (unsigned long const&), std::_Bind<void (tf2_ros::Buffer::* (tf2_ros::Buffer*, std::_Placeholder<1>, std::shared_ptr<std::promise<geometry_msgs::msg::TransformStamped_<std::allocator<void> > > >, tf2_ros::TransformStampedFuture, std::function<void (tf2_ros::TransformStampedFuture const&)>))(unsigned long const&, std::shared_ptr<std::promise<geometry_msgs::msg::TransformStamped_<std::allocator<void> > > >, tf2_ros::TransformStampedFuture, std::function<void (tf2_ros::TransformStampedFuture const&)>)> >::_M_invoke(std::_Any_data const&, unsigned long const&) (/opt/ros/humble/lib/libtf2_ros.so+0x5130b) (BuildId: 4ee0e77870b47da6b41a530a2970e853eafce773)
    #3 0x79ff0c84eb35 in std::_Function_handler<void (), std::_Bind<void (tf2_ros::CreateTimerROS::* (tf2_ros::CreateTimerROS*, unsigned long, std::function<void (unsigned long const&)>))(unsigned long const&, std::function<void (unsigned long const&)>)> >::_M_invoke(std::_Any_data const&) (/opt/ros/humble/lib/libtf2_ros.so+0x4eb35) (BuildId: 4ee0e77870b47da6b41a530a2970e853eafce773)
    #4 0x79ff0c84dd97  (/opt/ros/humble/lib/libtf2_ros.so+0x4dd97) (BuildId: 4ee0e77870b47da6b41a530a2970e853eafce773)
    #5 0x79ff0c407030 in rclcpp::Executor::execute_any_executable(rclcpp::AnyExecutable&) (/opt/ros/humble/lib/librclcpp.so+0xe7030) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #6 0x79ff0c40e97f in rclcpp::executors::SingleThreadedExecutor::spin() (/opt/ros/humble/lib/librclcpp.so+0xee97f) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #7 0x79ff0aadc252  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc252) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2)
    #8 0x79ff0a694ac2 in start_thread nptl/./nptl/pthread_create.c:442:8
    #9 0x79ff0a72684f  misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

0x6170000a5c90 is located 656 bytes inside of 704-byte region [0x6170000a5a00,0x6170000a5cc0)
freed by thread T0 here:
    #0 0x5f969785c98d in operator delete(void*) (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/nav2_amcl/amcl+0xe998d) (BuildId: 8210e9eb79b766128e96cd73bfae12d14cdc1560)
    #1 0x79ff0b3ec48c in std::__uniq_ptr_impl<tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer>, std::default_delete<tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer> > >::reset(tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer>*) (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/libamcl_core.so+0x3ec48c) (BuildId: f870739ccc7779266347919193e71b4cf78860c9)
    #2 0x79ff0b45a3ce in std::__invoke_result<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::*&)(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >), nav2_amcl::AmclNode*&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&>::type std::__invoke<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::*&)(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >), nav2_amcl::AmclNode*&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&>(rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::*&)(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >), nav2_amcl::AmclNode*&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&) (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/libamcl_core.so+0x45a3ce) (BuildId: f870739ccc7779266347919193e71b4cf78860c9)
    #3 0x79ff0b45a231 in rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > std::__invoke_impl<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> >, std::_Bind<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::* (nav2_amcl::AmclNode*, std::_Placeholder<1>))(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >)>&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&>(std::__invoke_other, std::_Bind<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::* (nav2_amcl::AmclNode*, std::_Placeholder<1>))(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >)>&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&) (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/libamcl_core.so+0x45a231) (BuildId: f870739ccc7779266347919193e71b4cf78860c9)

previously allocated by thread T0 here:
    #0 0x5f969785c12d in operator new(unsigned long) (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/nav2_amcl/amcl+0xe912d) (BuildId: 8210e9eb79b766128e96cd73bfae12d14cdc1560)
    #1 0x79ff0b3e42df in std::_MakeUniq<tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer> >::__single_object std::make_unique<tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer>, message_filters::Subscriber<sensor_msgs::msg::LaserScan_<std::allocator<void> >, rclcpp_lifecycle::LifecycleNode>&, tf2_ros::Buffer&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int, std::shared_ptr<rclcpp::node_interfaces::NodeLoggingInterface>, std::shared_ptr<rclcpp::node_interfaces::NodeClockInterface>, std::chrono::duration<long, std::ratio<1l, 1000000000l> >&>(message_filters::Subscriber<sensor_msgs::msg::LaserScan_<std::allocator<void> >, rclcpp_lifecycle::LifecycleNode>&, tf2_ros::Buffer&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&&, std::shared_ptr<rclcpp::node_interfaces::NodeLoggingInterface>&&, std::shared_ptr<rclcpp::node_interfaces::NodeClockInterface>&&, std::chrono::duration<long, std::ratio<1l, 1000000000l> >&) (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/libamcl_core.so+0x3e42df) (BuildId: f870739ccc7779266347919193e71b4cf78860c9)

Thread T12 created by T0 here:
    #0 0x5f969780a7dc in __interceptor_pthread_create (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/nav2_amcl/amcl+0x977dc) (BuildId: 8210e9eb79b766128e96cd73bfae12d14cdc1560)
    #1 0x79ff0aadc328 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc328) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2)
    #2 0x79ff0b3772f0 in nav2_amcl::AmclNode::on_configure(rclcpp_lifecycle::State const&) (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/libamcl_core.so+0x3772f0) (BuildId: f870739ccc7779266347919193e71b4cf78860c9)
    #3 0x79ff0c3038ec  (/opt/ros/humble/lib/librclcpp_lifecycle.so+0x288ec) (BuildId: 97f6428dc1ee45fd402b522b3b8e6b4fcfeabe76)

SUMMARY: AddressSanitizer: heap-use-after-free (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/libamcl_core.so+0x559bb9) (BuildId: f870739ccc7779266347919193e71b4cf78860c9) in tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer>::transformReadyCallback(tf2_ros::TransformStampedFuture const&, unsigned long)
Shadow bytes around the buggy address:
  0x0c2e8000cb40: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000cb50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000cb60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000cb70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000cb80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0c2e8000cb90: fd fd[fd]fd fd fd fd fd fa fa fa fa fa fa fa fa
  0x0c2e8000cba0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c2e8000cbb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2e8000cbc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2e8000cbd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2e8000cbe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==76105==ABORTING

the report shows that the bug occurs during transformReadyCallback of tf2_ros::MessageFilter works, so it should be a DIFFERENT reason from #4379 causing the similar bug.

And in the function tf2_ros::MessageFilter, it seems have no memory provided by amcl, so I think :

Is it a bug caused by the mechanism defect of tf2_ros ?

but this function is not provided by navigation2 and I'm not very fimiliar with it , so I'm not very sure about my opinion ......

SteveMacenski commented 3 weeks ago

Not sure - I think you should compile with debug flags and get a backtrace so we can see specifically where it fails https://docs.nav2.org/tutorials/docs/get_backtrace.html

It does seem likely like a tf2_ros problem though.

GoesM commented 3 weeks ago

It's an interesting way ! thanks for your share, I'd try on it in my later work.

SteveMacenski commented 3 weeks ago

It seems like somethings are being triggered after reset, so it would be good to know what is being called / triggered and thus see how to resolve it or if we made a mistake on not resetting everything required in the right order

SteveMacenski commented 1 week ago

Any word here @GoesM ?

GoesM commented 1 week ago

I conducted several experiments and discovered that the callback function (transformReadyCallback()) of tf2_ros::Message_filter can continue to execute even after the Message_filter has been destroyed.

I believe this is an issue stemming from a flaw in the destruction mechanism of tf2_ros::Message_filter. Therefore, I have opened a new issue for the geometry2 stack: https://github.com/ros/geometry2/issues/565.

SteveMacenski commented 1 week ago

Got it, think you have a potential solution? I'm not sure I can get anyone to fix it for us, but if you have a patch, I can get someone to review it for us!

GoesM commented 1 week ago

Uh... I'm very glad to submit a PR to contribute. After all, this issue may occur frequently if users utilize dynamic parameters during navigation2 operations.

However, before submitting the PR, I think I need more time to find the most suitable fix, learning about the details of how the tf2_ros::Message_filter is designed to manage its own threads .

SteveMacenski commented 1 week ago

OK. I'd want to transfer this conversation over there though so that this doesn't sit in Nav2's issue tracker since its not actionable to Nav2 as a bug with a dependency. The linking between these tickets I think does a fine job documenting it, but I'm going to go ahead and close this and we can follow up with future discussions in that ticket in geometry2 - I have notifications set to 'on'.