moveit / moveit2

:robot: MoveIt for ROS 2
https://moveit.ai/
BSD 3-Clause "New" or "Revised" License
1.11k stars 533 forks source link

FCL collision checking crashes during motion #3083

Closed rr-tom-noble closed 1 week ago

rr-tom-noble commented 2 weeks ago

Description

In some situations (I haven't been able to find a common cause), the move group crashes when executing a motion. Backtracing with gdb suggests that the crash is occuring within the FCL collision checker.

ROS Distro

Humble

OS and version

Ubuntu 22.04 (via docker)

Source or binary build?

Both (source for moveit_core and pilz_industrial_motion_planner, binaries for everything else)

If binary, which release version?

2.5.5 (for everything besides moveit_core and pilz_industrial_motion_planner)

If source, which branch?

https://github.com/rr-tom-noble/moveit2/tree/bugfix/pilz-fix (humble plus #3077, #3055, #3070) (only for moveit_core and pilz_industrial_motion_planner)

Which RMW are you using?

FastRTPS

Steps to Reproduce

Unsure, as the error appears to be sporadic. I'll update with details as I investigate.

Expected behavior

Valid motion plans should not crash the move group

Actual behavior

When attempting a motion, the move group (sometimes) crashes. Backtracing with gdb suggests that this is happening within the FCL collision checker.

Backtrace or Console output

Image

TSNoble commented 2 weeks ago

Some additional findings:

rr-tom-noble commented 2 weeks ago

I just checked and the typos in collision disabling have been fixed, so that rules out that as a cause.

I'm fairly confident the issue in in these lines, since the backtrace suggests the crash occurs within std::_Rb_tree::find(), which is the underlying type of std::set. active_components_only_ uses a std::set.

These are the logs leading up to the failure:

[ns_motion_motion_server-30] [INFO] [1731398504.992854780] [move_group_interface]: MoveGroup action client/server ready
[move_group-2] [INFO] [1731398504.993094281] [moveit_move_group_default_capabilities.move_action_capability]: Received request
[move_group-2] [INFO] [1731398504.993187273] [moveit_move_group_default_capabilities.move_action_capability]: executing..
[ns_motion_motion_server-30] [INFO] [1731398504.993182201] [move_group_interface]: Planning request accepted
[move_group-2] [INFO] [1731398505.025682103] [moveit_move_group_default_capabilities.move_action_capability]: Planning request received for MoveGroup action. Forwarding to planning pipeline.
[move_group-2] [INFO] [1731398505.025921357] [moveit_move_group_capabilities_base.move_group_capability]: Using planning pipeline 'pilz_industrial_motion_planner'
[move_group-2] [INFO] [1731398505.025978619] [moveit.pilz_industrial_motion_planner.trajectory_generator_ptp]: Initialized Point-to-Point Trajectory Generator.
[move_group-2] [INFO] [1731398505.025998497] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Generating PTP trajectory...
[move_group-2] [INFO] [1731398505.026005017] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Validating request...
[move_group-2] [INFO] [1731398505.026015528] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Request validated.
[move_group-2] [INFO] [1731398505.026020302] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Validating command specific request...
[move_group-2] [INFO] [1731398505.026024897] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Command specific request validated.
[move_group-2] [INFO] [1731398505.026088138] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Extracting motion plan info...
[move_group-2] [INFO] [1731398505.026095275] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Motion plan info extracted.
[move_group-2] [INFO] [1731398505.026099727] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Planning trajectory...
[move_group-2] [INFO] [1731398505.026106010] [moveit.pilz_industrial_motion_planner.trajectory_generator_ptp]: Goal already reached, set one goal point explicitly.
[move_group-2] [INFO] [1731398505.026113184] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Trajectory planned.
[move_group-2] [INFO] [1731398505.026253898] [moveit_move_group_default_capabilities.move_action_capability]: Motion plan was computed successfully.
[ns_motion_motion_server-30] [INFO] [1731398505.026461550] [move_group_interface]: Planning request complete!
[ns_motion_motion_server-30] [INFO] [1731398505.026839870] [move_group_interface]: time taken to generate plan: 0.000130243 seconds
[move_group-2] [INFO] [1731398505.027109540] [moveit_ros.trajectory_execution_manager]: Received event 'stop'
[move_group-2] [INFO] [1731398505.059315208] [moveit_move_group_default_capabilities.execute_trajectory_action_capability]: Received goal request
[move_group-2] [INFO] [1731398505.059413687] [moveit_move_group_default_capabilities.execute_trajectory_action_capability]: Execution request received
[move_group-2] [INFO] [1731398505.059433556] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.059449191] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.059493330] [moveit_ros.trajectory_execution_manager]: Validating trajectory with allowed_start_tolerance 0.01
[ns_motion_motion_server-30] [INFO] [1731398505.059393960] [move_group_interface]: Execute request accepted
[move_group-2] [INFO] [1731398505.092037996] [moveit_ros.trajectory_execution_manager]: Starting trajectory execution ...
[move_group-2] [INFO] [1731398505.092065627] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.092078071] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.092422814] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: sending trajectory to fake_manipulator_controller
[ros2_control_node-4] [INFO] [1731398505.092516203] [fake_manipulator_controller]: Received new action goal
[ros2_control_node-4] [INFO] [1731398505.092557186] [fake_manipulator_controller]: Accepted new action goal
[move_group-2] [INFO] [1731398505.092727880] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: fake_manipulator_controller started execution
[move_group-2] [INFO] [1731398505.092740935] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: Goal request accepted!
[ros2_control_node-4] [INFO] [1731398505.258340515] [fake_manipulator_controller]: Goal reached, success!
[move_group-2] [INFO] [1731398505.292939760] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: Controller 'fake_manipulator_controller' successfully finished
[move_group-2] [INFO] [1731398505.392107285] [moveit_ros.trajectory_execution_manager]: Completed trajectory execution with status SUCCEEDED ...
[move_group-2] [INFO] [1731398505.392147886] [moveit_move_group_default_capabilities.execute_trajectory_action_capability]: Execution completed: SUCCEEDED
[ns_motion_motion_server-30] [INFO] [1731398505.392399971] [move_group_interface]: Execute request success!
[move_group-2] [INFO] [1731398505.575211349] [moveit_ros.trajectory_execution_manager]: Received event 'stop'
[move_group-2] [INFO] [1731398505.592394086] [moveit_move_group_default_capabilities.execute_trajectory_action_capability]: Received goal request
[move_group-2] [INFO] [1731398505.592457884] [moveit_move_group_default_capabilities.execute_trajectory_action_capability]: Execution request received
[move_group-2] [INFO] [1731398505.592475875] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.592502318] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.592531202] [moveit_ros.trajectory_execution_manager]: Validating trajectory with allowed_start_tolerance 0.01
[ns_motion_motion_server-30] [INFO] [1731398505.592511270] [move_group_interface]: Execute request accepted
[move_group-2] [INFO] [1731398505.625645950] [moveit_ros.trajectory_execution_manager]: Starting trajectory execution ...
[move_group-2] [INFO] [1731398505.625676937] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.625686477] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.626042684] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: sending trajectory to fake_manipulator_controller
[ros2_control_node-4] [INFO] [1731398505.626138961] [fake_manipulator_controller]: Received new action goal
[ros2_control_node-4] [INFO] [1731398505.626179536] [fake_manipulator_controller]: Accepted new action goal
[move_group-2] [INFO] [1731398505.626258316] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: fake_manipulator_controller started execution
[move_group-2] [INFO] [1731398505.626268719] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: Goal request accepted!
[ros2_control_node-4] [INFO] [1731398505.691653330] [fake_manipulator_controller]: Goal reached, success!
[move_group-2] [INFO] [1731398505.726385311] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: Controller 'fake_manipulator_controller' successfully finished
[move_group-2] [INFO] [1731398505.825666208] [moveit_ros.trajectory_execution_manager]: Completed trajectory execution with status SUCCEEDED ...
[move_group-2] [INFO] [1731398505.825708487] [moveit_move_group_default_capabilities.execute_trajectory_action_capability]: Execution completed: SUCCEEDED
[ns_motion_motion_server-30] [INFO] [1731398505.825938128] [move_group_interface]: Execute request success!
[ERROR] [move_group-2]: process has died [pid 63, exit code -11, cmd '/opt/ros/humble/lib/moveit_ros_move_group/move_group --ros-args -r __ns:=/move_group --params-file /tmp/launch_params_bnavf_bb --params-file /tmp/launch_params_m7qcuz37 --params-file /tmp/launch_params_42mjwpxx --params-file /tmp/launch_params_1u87_tmt --params-file /tmp/launch_params_c8b4stge --params-file /tmp/launch_params_ebym3ekg --params-file /tmp/launch_params_hai0u7e8 --params-file /tmp/launch_params_ky6e47uj --params-file /tmp/launch_params_67fn9m32 --params-file /tmp/launch_params_3ev073xe'].
[move_group-2] 
rr-tom-noble commented 2 weeks ago

Ah actually, our version of moveit builds moveit_core and pilz_industrial_motion_planner from source, with some fixes from recent PRs. The branch is https://github.com/rr-tom-noble/moveit2/tree/bugfix/pilz-fix, which is based on humble plus #3077, #3055, #3070. None of those PRs touch collision checking and I believe FCL runs after the motion has been planned, so I think this represents a new issue we're running into now that those PRs have fixed our usage of pilz and the plan is reaching the collision checking stage

rr-mark commented 2 weeks ago

Backtrace with line numbers:


Thread 22 "move_group" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f7a7fe00640 (LWP 672)]
0x00007f7aa4b8cd79 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) 
(gdb) bt
#0  0x00007f7aa4b8cd79 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f7aa4271fd7 in std::char_traits<char>::compare (
    __n=<optimized out>, __s2=<optimized out>, 
    __s1=0x6177616b736179 <error: Cannot access memory at address 0x6177616b736179>) at /usr/include/c++/11/bits/char_traits.h:389
#2  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare (this=0x556107bbe0c0, __str="diagonal_wall", this=0x556107bbe0c0, 
    __str="diagonal_wall") at /usr/include/c++/11/bits/basic_string.h:2879
#3  std::operator< <char, std::char_traits<char>, std::allocator<char> > (
    __rhs="diagonal_wall", 
    __lhs=<error: Cannot access memory at address 0x6177616b736179>)
    at /usr/include/c++/11/bits/basic_string.h:6343
#4  std::operator< <std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__y={...}, __x={...})
    at /usr/include/c++/11/bits/stl_pair.h:490
#5  std::less<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::operator() (__y={...}, __x={...}, 
    this=0x7f7a7fdfecf8) at /usr/include/c++/11/bits/stl_function.h:400
#6  std::_Rb_tree<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::pair<std::pair<std::__cxx11::basic_string<c--Type <RET> for more, q to quit, c to continue without paging--
har, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > >, std::_Select1st<std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > > >, std::less<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > > > >::_M_lower_bound (
    __k={...}, __y=0x7f7a7fdfed00, __x=0x556107bbe0a0, this=0x7f7a7fdfecf8)
    at /usr/include/c++/11/bits/stl_tree.h:1905
#7  std::_Rb_tree<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > >, std::_Select1st<std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits--Type <RET> for more, q to quit, c to continue without paging--
<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > > >, std::less<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > > > >::find (
    this=this@entry=0x7f7a7fdfecf8, __k={...})
    at /usr/include/c++/11/bits/stl_tree.h:2523
#8  0x00007f7aa426df29 in std::map<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> >, std::less<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > > > >::find (
    __x={...}, this=0x7f7a7fdfecf8) at /usr/include/c++/11/bits/stl_map.h:1170
#9  collision_detection::collisionCallback (o1=<optimized out>, 
    o2=0x7f7a38002000, data=0x7f7a7fdfe960)
--Type <RET> for more, q to quit, c to continue without paging--
    at /root/workspace/src/moveit2/moveit_core/collision_detection_fcl/src/collision_common.cpp:167
#10 0x00007f7aa34a185e in ?? () from /lib/x86_64-linux-gnu/libfcl.so.0.7
#11 0x00007f7aa34a185e in ?? () from /lib/x86_64-linux-gnu/libfcl.so.0.7
#12 0x00007f7aa34a185e in ?? () from /lib/x86_64-linux-gnu/libfcl.so.0.7
#13 0x00007f7aa34a185e in ?? () from /lib/x86_64-linux-gnu/libfcl.so.0.7

#14 0x00007f7aa427e8ef in collision_detection::CollisionEnvFCL::checkSelfCollisionHelper (this=0x7f7a6c0e2b60, req=..., res=..., state=..., acm=0x5561080b1350)
    at /root/workspace/src/moveit2/moveit_core/collision_detection_fcl/src/collision_env_fcl.cpp:269
#15 0x00007f7a9e34902a in move_group::MoveGroupStateValidationService::computeService(std::shared_ptr<rmw_request_id_s> const&, std::shared_ptr<moveit_msgs::srv::GetStateValidity_Request_<std::allocator<void> > > const&, std::shared_ptr<moveit_msgs::srv::GetStateValidity_Response_<std::allocator<void> > > const&) ()
   from /opt/ros/humble/lib/libmoveit_move_group_default_capabilities.so
#16 0x00007f7a9e3455bf in ?? ()
   from /opt/ros/humble/lib/libmoveit_move_group_default_capabilities.so
#17 0x00007f7aa4fcc376 in ?? () from /opt/ros/humble/lib/librclcpp.so
#18 0x00007f7aa4fc9d5a in rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>) () from /opt/ros/humble/lib/librclcpp.so
#19 0x00007f7aa4fca0c6 in rclcpp::Executor::execute_any_executable(rclcpp::AnyExecutable&) () from /opt/ros/humble/lib/librclcpp.so
#20 0x00007f7aa4fd134a in rclcpp::executors::MultiThreadedExecutor::run(unsigned--Type <RET> for more, q to quit, c to continue without paging--
 long) () from /opt/ros/humble/lib/librclcpp.so
#21 0x00007f7aa4d18253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#22 0x00007f7aa4a87ac3 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#23 0x00007f7aa4b18a04 in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) 
rr-mark commented 2 weeks ago

Probably not related to the current problem, but line 199 of collision_common.cpp is suspicious:

      const std::pair<std::string, std::string>& pc = cd1->getID() < cd2->getID() ?
                                                          std::make_pair(cd1->getID(), cd2->getID()) :
                                                          std::make_pair(cd2->getID(), cd1->getID());

Why is this storing the output of std::make_pair by reference?

rhaschke commented 1 week ago

@rr-mark, indeed collision_common.cpp:199 is wrong. However, probably it's unrelated to the observed segfault, which occurs earlier, in collision_common.cpp:167.

The crash is also unrelated to execution: The log clearly shows that execution was successfully finished. Only later, in MoveGroupStateValidationService, the crash occurs. Do you explicitly call this service?

rr-mark commented 1 week ago

Yes, we call /move_group/check_state_validity for checking pre-planned trajectories are still valid.

rr-mark commented 1 week ago

And yes, you're right, it is during that service call that it crashes.

rr-mark commented 1 week ago

I'm adding some debug prints now to try and work out what's going wrong.

rr-mark commented 1 week ago

It looks like cdata->res_->contacts is being used uninitialised. Printing cdata->res_->contacts.size() gives 97998116997072 (I'm assuming this is just random memory), and trying to iterate across cdata->res_->contacts throws

terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid
rr-mark commented 1 week ago

Both cdata and cdata->res_ look like valid pointers.

rr-mark commented 1 week ago

(Fixing line 199 did not fix the crash)

rhaschke commented 1 week ago

Objects are automatically initialized by calling the default constructor. I think cdata->res_->contacts is corrupted (or released) after creation. You need to figure out why/when.

rr-mark commented 1 week ago

I'm confused.

I'm just rejigging my dependency stack so I can get debugging and prints from moveit_ros.

rr-mark commented 1 week ago

Hmm, adding moveit_ros.move_group to our list of source dependencies seems to have fixed the problem. Maybe this is a problem with our mix of source and binary dependencies, rather than a problem with moveit.

rhaschke commented 1 week ago

Probably. The collision checking code is rather old and segfaults weren't reported for quite some time...

rr-tom-noble commented 1 week ago

As @rr-mark mentioned, building the additional packages seems to have fixed the issue. I think the cause was that our binaries are from 2.5.5, whereas our build-from-source packages are based on the moveit2 humble branch, which contains backports, so we were likely linking between incompatible shared objects.

Closing this issue.