Open henryponti opened 5 years ago
Thanks for reporting an issue. We will have a look asap. If you can think of a fix, please consider providing it as a pull request.
Thanks for the additional info @BryceStevenWilley. I tried to check if the thread is joinable, but that didn't help:
https://gist.github.com/henryponti/cbef093c1c00f635bdb0f5a2c9204c56
Could the problem be related to this?
Calling detach() instead of join() does seem to help:
https://gist.github.com/henryponti/c70dc65f78a31b82be9a6ef041cf827d
But it also crashes eventually:
https://gist.github.com/henryponti/c5fb862a8ff6ae7863fcc65d16d3f17a
From what I could tell (at least with how it was manifesting on my machine), the issue isn't whether or not the thread is joinable or not, it's the fact that the thread is null. Calling any function on a null will cause the segfault. I think that execution_thread_.reset();
is being called in another thread somehow, likely from some buggy way of handling the mutex/boolean execution_complete_
variable.
Just read through the stack overflow question you posted, that seems to be very much related, but I'm out of my depth with the exact situation that's happening.
I agree. I tried to check if the pointer is null as well but it didn't help either. It looks like some deeper digging will be necessary. I am not terribly familiar with the architecture though, so I'm not sure I'll be able to get to the bottom of the problem without some extra help.
https://gist.github.com/henryponti/6d557b87ca42e26f40abb5b6826317dc
https://gist.github.com/henryponti/819d8e043d5a4497d5bb258161766bf6
@BryceStevenWilley, you are exactly right: Essentially we have two non-synchronized threads competing for access to execution_thread_
.
ActionServer
thread, calling stopExecution()
before executing a new path and after waiting for its completion:
#11 trajectory_execution_manager::TrajectoryExecutionManager::stopExecution(bool) () from /home/henry/catkin_ws/devel/lib/libmoveit_trajectory_execution_manager.so.1.0.1
#12 trajectory_execution_manager::TrajectoryExecutionManager::waitForExecution() () from /home/henry/catkin_ws/devel/lib/libmoveit_trajectory_execution_manager.so.1.0.1
#13 move_group::MoveGroupExecuteTrajectoryAction::executePath(boost::shared_ptr<moveit_msgs::ExecuteTrajectoryGoal_<std::allocator<void> > const> const&, moveit_msgs::ExecuteTrajectoryResult_<std::allocator<void> >&) () from /home/henry/catkin_ws/devel/lib/libmoveit_move_group_default_capabilities.so
#14 move_group::MoveGroupExecuteTrajectoryAction::executePathCallback(boost::shared_ptr<moveit_msgs::ExecuteTrajectoryGoal_<std::allocator<void> > const> const&) () from /home/henry/catkin_ws/devel/lib/libmoveit_move_group_default_capabilities.so
#15 actionlib::SimpleActionServer<moveit_msgs::ExecuteTrajectoryAction_<std::allocator<void> > >::executeLoop() () from /home/henry/catkin_ws/devel/lib/libmoveit_move_group_default_capabilities.so
stopExecution()
when receiving stop
on the trajectory_execution_event
topic:
#0 trajectory_execution_manager::TrajectoryExecutionManager::stopExecution(bool)@plt () from /vol/sandbox/ros/devel/lib/libmoveit_trajectory_execution_manager.so.1.0.1
#1 trajectory_execution_manager::TrajectoryExecutionManager::processEvent (this=0x555556092340, event="stop") at /vol/sandbox/ros/src/moveit/moveit_ros/planning/trajectory_execution_manager/src/trajectory_execution_manager.cpp:225
#2 trajectory_execution_manager::TrajectoryExecutionManager::receiveEvent (this=0x555556092340, event=...) at /vol/sandbox/ros/src/moveit/moveit_ros/planning/trajectory_execution_manager/src/trajectory_execution_manager.cpp:233
ActionServer
thread, calling stopExecution()
when preempting:
#4 trajectory_execution_manager::TrajectoryExecutionManager::stopExecution (this=0x5555560921a0, auto_clear=true) at /vol/sandbox/ros/src/moveit/moveit_ros/planning/trajectory_execution_manager/src/trajectory_execution_manager.cpp:1173
#5 move_group::MoveGroupExecuteTrajectoryAction::preemptExecuteTrajectoryCallback (this=0x5555560e4510) at /vol/sandbox/ros/src/moveit/moveit_ros/move_group/src/default_capabilities/execute_trajectory_action_capability.cpp:128
This raises several questions:
mutex
-protect the access, particularly which variables, apart from execution_thread_
, need to be protected?I haven't had a chance to look into that over the past few days. I am hoping to take another look early next week. The information from @BryceStevenWilley will help a lot.
Description
When a trajectory is preempted, trajectory_execution_manager sometimes crashes with a boost mutex error.
Environment
Steps to reproduce
The script below should cause the crash within a few minutes.
https://gist.github.com/henryponti/21d14e59804d294fc3617e21ccdbd91e
Expected behaviour
move_group should allow the trajectory to be preempted without crashing.
Actual behaviour
move_group crashes.
Backtrace
https://gist.github.com/henryponti/5ed3bafd6f66fb4aedbadea2df82f336