ros / actionlib

Provides a standardized interface for interfacing with preemptable tasks. Examples of this include moving the base to a target location, performing a laser scan and returning the resulting point cloud, detecting the handle of a door, etc.
http://www.ros.org/wiki/actionlib
96 stars 155 forks source link

Action client/server block unexpectedly #90

Open alextoind opened 6 years ago

alextoind commented 6 years ago

I don't know if I am able to let you reproduce this problem but I try to explain in a detailed fashion.

I am using a default position_controllers/JointTrajectoryController with a default actionlib::SimpleActionClient<control_msgs::FollowJointTrajectoryAction> action_client_ on a physical device with just one motor, i.e. one actuator/joint.

I have a control node which exploit a ros::WallTimer control_timer_ (based on a fixed ros::WallDuration control_duration_, e.g. 0.001s) to call the control loop method periodically and forever:

void deviceControl::update(const ros::WallTime& time, const ros::WallDuration& period) {
  // read the state from the hardware
  device_->read(ros::Time(time.toSec()), ros::Duration(period.toSec()));

  // update the controllers (set the new control actions)
  controller_manager_.update(ros::Time(time.toSec()), ros::Duration(period.toSec()));

  // write the commands to the hardware
  device_->write(ros::Time(time.toSec()), ros::Duration(period.toSec()));
}

I also use a simple method to send a new goal to the action server:

void deviceControl::move(const trajectory_msgs::JointTrajectory &joint_trajectory) {
  control_msgs::FollowJointTrajectoryAction control_action;
  control_action.action_goal.goal.trajectory = joint_trajectory;
  control_action.action_goal.header.stamp = ros::Time(0);
  action_client_.sendGoal(control_action.action_goal.goal,
                          std::bind(&deviceControl::actionDoneCallback, this, std::placeholders::_1, std::placeholders::_2),
                          std::bind(&deviceControl::actionActiveCallback, this),
                          std::bind(&deviceControl::actionFeedbackCallback, this, std::placeholders::_1));
}

And I'd like to loop the trajectory forever, e.g. if the trajectory is a period of a sine wave, I'd like to repeat it as soon as one period is finished. To do so, I've simply implemented the actionDoneCallback like this:

void deviceControl::actionDoneCallback(const actionlib::SimpleClientGoalState &state, const control_msgs::FollowJointTrajectoryResultConstPtr &result) {
  ... // debug info

  // automatically loop the waypoint trajectory
  if (waypoint_joint_trajectory_.points.size()) {
    move(waypoint_joint_trajectory_);
  }
}

where the waypoint_joint_trajectory_ is initialized in the constructor of the control node and it is never touched any longer.

This mechanism almost works well, and the motor moves for hours without problems. However it eventually blocks (like it is waiting on a mutex) after a new goal is sent to the server. The scheme is always the same:

...
actionFeedbackCallback prompt on screen feeback info
actionDoneCallback prompt that current goal ends in state SUCCEED
new goal is sent
actionActiveCallback is called (and do nothing)
then nothing more happens...

The control node continues to run, but nothing is moving any longer since the reference (usually ~0) is close to the current motor position.

I've also investigate a bit when the fault happens and it seems that the list of tracked goals advertised in position_trajectory_controller/follow_joint_trajectory/status which contained several entries, becomes suddenly of two only elements, and does not change anymore.

The strange thing is that the action server is not really stuck. Indeed, if I call manually the move method to send a new goal, it starts to move again (apparently without problems) till the next unexpected block.

As a workaround I started to send new goals with another timer, properly set, instead of using the actionDoneCallback. However, this is not the best option, since it can lead to discontinuity in the motion if there is a delay in the previous goal reference pursuit, i.e. if the goal is not ending on time.

This behavior seems to happen more frequently and sooner if the control loop period becomes shorter, or there are several nodes running. Indeed it seems to be a race condition which happens very rarely.

Can you suggest me what to investigate to understand better this problem? Thank you very much!

alextoind commented 6 years ago

Brief output of the fault:

...
[DEBUG] [1512056439.057251123]: Trying to transition to WAITING_FOR_RESULT
[DEBUG] [1512056439.057272405]: Transitioning CommState from ACTIVE to WAITING_FOR_RESULT
[DEBUG] [1512056439.057292738]: Trying to transition to DONE
[DEBUG] [1512056439.057306039]: Transitioning CommState from WAITING_FOR_RESULT to DONE
[DEBUG] [1512056439.057321230]: Transitioning SimpleState from [ACTIVE] to [DONE]
[ INFO] [1512056439.057346394]: Device [1] action ended in state [SUCCEEDED].

[DEBUG] [1512056439.057385762]: about to start initGoal()
[DEBUG] [1512056439.057433641]: Done with initGoal()
[DEBUG] [1512056439.057460294]: IN DELETER
[DEBUG] [1512056439.057474497]: About to erase CommStateMachine
[DEBUG] [1512056439.057490310]: Done erasing CommStateMachine
[DEBUG] [1512056439.057598676]: Getting status over the wire.
[DEBUG] [1512056439.057652067]: The action server has received a new goal request
[DEBUG] [1512056439.057683635]: Received new action goal
[DEBUG] [1512056439.057712674]: Figuring out new trajectory starting at time 1512056439.057
[DEBUG] [1512056439.057731494]: Using alternate time base. In it, the new trajectory starts at time 23.907
[DEBUG] [1512056439.057774947]: Trajectory of joint device_motor_jointhas 5 segments:
- 1 segment(s) will still be executed from previous trajectory.
- 1 segment added for transitioning between the current trajectory and first point of the input message.
- 3 new segments (4 points) taken from the input trajectory.
[DEBUG] [1512056439.057805991]: Accepting goal, id: /device_control-1424-1512056439.57398898, stamp: 1512056439.06
[DEBUG] [1512056439.057910074]: Timer deregistering callbacks.
[DEBUG] [1512056439.058014465]: Getting status over the wire.
[DEBUG] [1512056439.058041362]: Trying to transition to ACTIVE
[DEBUG] [1512056439.058055758]: Transitioning CommState from WAITING_FOR_GOAL_ACK to ACTIVE
[DEBUG] [1512056439.058073426]: Transitioning SimpleState from [PENDING] to [ACTIVE]
[ INFO] [1512056439.058088887]: Device [1] action start.
[DEBUG] [1512056439.058131318]: Time jumped forward by [0.001101] for timer of period [0.001000], resetting timer (current=1512056439.057251, next_expected=1512056439.056150)
[DEBUG] [1512056439.060329179]: Time jumped forward by [0.002048] for timer of period [0.001000], resetting timer (current=1512056439.060299, next_expected=1512056439.058251)
[DEBUG] [1512056439.062277679]: Publishing transforms for moving joints
[DEBUG] [1512056439.062533633]: Time jumped forward by [0.001218] for timer of period [0.001000], resetting timer (current=1512056439.062517, next_expected=1512056439.061299)
[DEBUG] [1512056439.064811079]: Time jumped forward by [0.001278] for timer of period [0.001000], resetting timer (current=1512056439.064795, next_expected=1512056439.063517)
[DEBUG] [1512056439.066921773]: Time jumped forward by [0.001109] for timer of period [0.001000], resetting timer (current=1512056439.066904, next_expected=1512056439.065795)
[DEBUG] [1512056439.067205534]: Publishing feedback for goal, id: /device_control-1424-1512056439.57398898, stamp: 1512056439.06
[DEBUG] [1512056439.067235551]: Publishing feedback for goal with id: /device_control-1424-1512056439.57398898 and stamp: 1512056439.06
[DEBUG] [1512056439.067297379]: Device [1] current joint state is [0.785015] (expecting [-1]).
[DEBUG] [1512056439.070660800]: Time jumped forward by [0.002733] for timer of period [0.001000], resetting timer (current=1512056439.070637, next_expected=1512056439.067904)
[DEBUG] [1512056439.072734269]: Time jumped forward by [0.001080] for timer of period [0.001000], resetting timer (current=1512056439.072717, next_expected=1512056439.071637)
[DEBUG] [1512056439.072819988]: Publishing transforms for moving joints
[DEBUG] [1512056439.074842024]: Time jumped forward by [0.001106] for timer of period [0.001000], resetting timer (current=1512056439.074823, next_expected=1512056439.073718)
[DEBUG] [1512056439.075400728]: Setting status to succeeded on goal, id: /device_control-1424-1512056439.57398898, stamp: 1512056439.06
[DEBUG] [1512056439.075431403]: Publishing result for goal with id: /device_control-1424-1512056439.57398898 and stamp: 1512056439.06
[DEBUG] [1512056439.075663665]: Trying to transition to WAITING_FOR_RESULT
[DEBUG] [1512056439.075683404]: Transitioning CommState from ACTIVE to WAITING_FOR_RESULT
[DEBUG] [1512056439.075709646]: Trying to transition to DONE
[DEBUG] [1512056439.075723121]: Transitioning CommState from WAITING_FOR_RESULT to DONE
[DEBUG] [1512056439.075737828]: Transitioning SimpleState from [ACTIVE] to [DONE]
[ INFO] [1512056439.075763926]: Device [1] action ended in state [SUCCEEDED].

[DEBUG] [1512056439.075800778]: about to start initGoal()
[DEBUG] [1512056439.075865196]: Done with initGoal()
[DEBUG] [1512056439.075893315]: IN DELETER
[DEBUG] [1512056439.075907254]: About to erase CommStateMachine
[DEBUG] [1512056439.075922366]: Done erasing CommStateMachine
[DEBUG] [1512056439.076095275]: The action server has received a new goal request
[DEBUG] [1512056439.076122627]: Received new action goal
[DEBUG] [1512056439.076155514]: Figuring out new trajectory starting at time 1512056439.075
[DEBUG] [1512056439.076173094]: Using alternate time base. In it, the new trajectory starts at time 23.925
[DEBUG] [1512056439.076220667]: Trajectory of joint device_motor_jointhas 5 segments:
- 1 segment(s) will still be executed from previous trajectory.
- 1 segment added for transitioning between the current trajectory and first point of the input message.
- 3 new segments (4 points) taken from the input trajectory.
[DEBUG] [1512056439.076251564]: Accepting goal, id: /device_control-1425-1512056439.75813088, stamp: 1512056439.08
[DEBUG] [1512056439.076346887]: Incoming queue was full for topic "device_motor_position_trajectory_controller/follow_joint_trajectory/status". Discarded oldest message (current queue size [0])
[DEBUG] [1512056439.076440065]: Timer deregistering callbacks.
[DEBUG] [1512056439.076464611]: Publishing feedback for goal, id: /device_control-1424-1512056439.57398898, stamp: 1512056439.06
[DEBUG] [1512056439.076508834]: Publishing feedback for goal with id: /device_control-1424-1512056439.57398898 and stamp: 1512056439.06
[DEBUG] [1512056439.076647086]: Getting status over the wire.
[DEBUG] [1512056439.076680616]: Trying to transition to ACTIVE
[DEBUG] [1512056439.076694954]: Transitioning CommState from WAITING_FOR_GOAL_ACK to ACTIVE
[DEBUG] [1512056439.076759101]: Transitioning SimpleState from [PENDING] to [ACTIVE]
[ INFO] [1512056439.076801119]: Device [1] action start.
[DEBUG] [1512056439.077268289]: Time jumped forward by [0.001429] for timer of period [0.001000], resetting timer (current=1512056439.077252, next_expected=1512056439.075823)
[DEBUG] [1512056439.079379594]: Time jumped forward by [0.001113] for timer of period [0.001000], resetting timer (current=1512056439.079365, next_expected=1512056439.078252)
[DEBUG] [1512056439.081512586]: Time jumped forward by [0.001126] for timer of period [0.001000], resetting timer (current=1512056439.081491, next_expected=1512056439.080365)
[DEBUG] [1512056439.083412504]: Publishing transforms for moving joints
[DEBUG] [1512056439.083793358]: Time jumped forward by [0.001286] for timer of period [0.001000], resetting timer (current=1512056439.083777, next_expected=1512056439.082491)
[DEBUG] [1512056439.086042201]: Time jumped forward by [0.001247] for timer of period [0.001000], resetting timer (current=1512056439.086024, next_expected=1512056439.084777)
[DEBUG] [1512056439.088289953]: Time jumped forward by [0.001246] for timer of period [0.001000], resetting timer (current=1512056439.088270, next_expected=1512056439.087024)
[DEBUG] [1512056439.090407821]: Time jumped forward by [0.001124] for timer of period [0.001000], resetting timer (current=1512056439.090393, next_expected=1512056439.089270)
[DEBUG] [1512056439.092653858]: Time jumped forward by [0.001246] for timer of period [0.001000], resetting timer (current=1512056439.092640, next_expected=1512056439.091393)
[DEBUG] [1512056439.093971369]: Publishing transforms for moving joints
[DEBUG] [1512056439.094756917]: Time jumped forward by [0.001101] for timer of period [0.001000], resetting timer (current=1512056439.094741, next_expected=1512056439.093640)
[DEBUG] [1512056439.096883845]: Time jumped forward by [0.001128] for timer of period [0.001000], resetting timer (current=1512056439.096869, next_expected=1512056439.095741)
[DEBUG] [1512056439.098974103]: Time jumped forward by [0.001091] for timer of period [0.001000], resetting timer (current=1512056439.098960, next_expected=1512056439.097869)
[DEBUG] [1512056439.101105646]: Time jumped forward by [0.001133] for timer of period [0.001000], resetting timer (current=1512056439.101093, next_expected=1512056439.099961)
[DEBUG] [1512056439.103385181]: Time jumped forward by [0.001264] for timer of period [0.001000], resetting timer (current=1512056439.103357, next_expected=1512056439.102093)
[DEBUG] [1512056439.105531060]: Time jumped forward by [0.001158] for timer of period [0.001000], resetting timer (current=1512056439.105516, next_expected=1512056439.104357)
[DEBUG] [1512056439.107782106]: Time jumped forward by [0.001252] for timer of period [0.001000], resetting timer (current=1512056439.107768, next_expected=1512056439.106516)
[DEBUG] [1512056439.109900162]: Time jumped forward by [0.001120] for timer of period [0.001000], resetting timer (current=1512056439.109887, next_expected=1512056439.108768)
[DEBUG] [1512056439.112157526]: Time jumped forward by [0.001256] for timer of period [0.001000], resetting timer (current=1512056439.112143, next_expected=1512056439.110888)
[DEBUG] [1512056439.114004551]: Publishing transforms for moving joints
[DEBUG] [1512056439.114421523]: Time jumped forward by [0.001258] for timer of period [0.001000], resetting timer (current=1512056439.114401, next_expected=1512056439.113143)
[DEBUG] [1512056439.117230029]: Time jumped forward by [0.001812] for timer of period [0.001000], resetting timer (current=1512056439.117213, next_expected=1512056439.115401)
[DEBUG] [1512056439.121121055]: Time jumped forward by [0.002885] for timer of period [0.001000], resetting timer (current=1512056439.121098, next_expected=1512056439.118213)
[DEBUG] [1512056439.123395763]: Time jumped forward by [0.001280] for timer of period [0.001000], resetting timer (current=1512056439.123379, next_expected=1512056439.122098)
[DEBUG] [1512056439.125478982]: Time jumped forward by [0.001085] for timer of period [0.001000], resetting timer (current=1512056439.125463, next_expected=1512056439.124378)
[DEBUG] [1512056439.127602873]: Time jumped forward by [0.001115] for timer of period [0.001000], resetting timer (current=1512056439.127578, next_expected=1512056439.126463)
[DEBUG] [1512056439.129783575]: Time jumped forward by [0.001156] for timer of period [0.001000], resetting timer (current=1512056439.129735, next_expected=1512056439.128578)
[DEBUG] [1512056439.132043174]: Time jumped forward by [0.001286] for timer of period [0.001000], resetting timer (current=1512056439.132021, next_expected=1512056439.130735)
[DEBUG] [1512056439.133966007]: Publishing transforms for moving joints
[DEBUG] [1512056439.134757700]: Time jumped forward by [0.001718] for timer of period [0.001000], resetting timer (current=1512056439.134739, next_expected=1512056439.133021)
[DEBUG] [1512056439.137049198]: Time jumped forward by [0.001292] for timer of period [0.001000], resetting timer (current=1512056439.137031, next_expected=1512056439.135739)
[DEBUG] [1512056439.139261219]: Time jumped forward by [0.001213] for timer of period [0.001000], resetting timer (current=1512056439.139244, next_expected=1512056439.138031)
[DEBUG] [1512056439.141537131]: Time jumped forward by [0.001279] for timer of period [0.001000], resetting timer (current=1512056439.141523, next_expected=1512056439.140244)
[DEBUG] [1512056439.143761161]: Time jumped forward by [0.001223] for timer of period [0.001000], resetting timer (current=1512056439.143746, next_expected=1512056439.142523)
...