magazino / move_base_flex

Move Base Flex: a backwards-compatible replacement for move_base
https://uos.github.io/mbf_docs/
BSD 3-Clause "New" or "Revised" License
429 stars 154 forks source link

On controller cancel, wait for the control loop to stop #215

Closed corot closed 3 years ago

corot commented 4 years ago

This fixes the sometimes appearing To transition to a cancelled state, the goal must be in a pending, recalling, active, or preempting state, it is currently in state: 4

I have noticed that it happens when we call exe_path right after it fails, e.g. when running a very fast recovery behavior in between. The reason seems to be that we set the goal as aborted and exit the control loop, but the execution is still active:

If you send another goal to exe_path immediately, MBF prints the error shown above and doesn't execute the new goal. use the following script to reproduce:

#!/usr/bin/env python

import rospy
import actionlib

from geometry_msgs.msg import PoseStamped

from mbf_msgs.msg import GetPathAction, GetPathGoal, GetPathResult
from mbf_msgs.msg import ExePathAction, ExePathGoal, ExePathResult

__author__ = 'santos'

""" 
Simplest possible MBF executive: sends goals provided through RViz on 2D Nav Goal tool to
move_base_flex/get_path action and sends the resulting path to move_base_flex/exe_path action
No fancy recovery or retry strategies.
"""

exe_path_goal = None

def nav_goal_cb(msg):
    global target_pose
    target_pose = msg
    goal = GetPathGoal(use_start_pose=False,
                       start_pose=msg,
                       target_pose=msg)
    get_path_ac.send_goal(goal, done_cb=plan_done_cb)

def plan_done_cb(status, result):
    if result.outcome == GetPathResult.SUCCESS:
        rospy.loginfo("Get path action succeeded")

        global exe_path_goal
        exe_path_goal = ExePathGoal(path=result.path, controller="PoseFollower")
        exe_path_ac.send_goal(exe_path_goal, done_cb=follow_done_cb)
    else:
        rospy.logerr("Get path action failed with error code [%d]: %s", result.outcome, result.message)

def follow_done_cb(status, result):
    if result.outcome == GetPathResult.SUCCESS:
        rospy.loginfo("Follow path action succeeded")
    else:
        rospy.logerr("Follow path action failed with error code [%d]: %s", result.outcome, result.message)
    #rospy.sleep(0.5)  TODO uncomment to make current code work
    exe_path_ac.send_goal(exe_path_goal, done_cb=follow_done_cb)

if __name__ == '__main__':
    rospy.init_node("simple_mbf_executive")

    target_pose = None

    get_path_ac = actionlib.SimpleActionClient("/move_base_flex/get_path", GetPathAction)
    get_path_ac.wait_for_server(rospy.Duration(5))
    exe_path_ac = actionlib.SimpleActionClient("/move_base_flex/exe_path", ExePathAction)
    exe_path_ac.wait_for_server(rospy.Duration(5))

    rospy.Subscriber('/move_base_simple/goal', PoseStamped, nav_goal_cb)

    rospy.spin()

Steps

  1. run the script
  2. send any goal with RViz
  3. force the robot (easier on simulation) to not move, so oscillation triggers
  4. on master, it freezes with the second iteration
  5. uncomment the sleep in the script; not shouldn't freeze
  6. comment again and try this PR; should also work

@Timple, would be great if you can try this, as you where toying with gentle controller canceling

Timple commented 4 years ago

We're doing very similar things in our cancel loop. Wait for a flag to become true (know that the robot has stopped).

I'll evaluate!

dorezyuk commented 4 years ago

i have to verify but my suspicion is that we have there a spurious wakeup. IHMO the right solution would be to check inside the waitForStateUpdate function if the state_ variable has the desired value. The problem is that the AbstractExecutionBase has no access the the state var - so we would need to pass a predicate function to the waitForStateUpdate

corot commented 4 years ago

i have to verify but my suspicion is that we have there a spurious wakeup. IHMO the right solution would be to check inside the waitForStateUpdate function if the state_ variable has the desired value. The problem is that the AbstractExecutionBase has no access the the state var - so we would need to pass a predicate function to the waitForStateUpdate

Do you mean, in this PR branch? mmm.... not sure what do you mean; can you provide more details? Btw, did you try the script?

dorezyuk commented 4 years ago

@corot I couldn't reproduce the issue despite trying really hard. But from what i understand the issue is that we first set the goal as aborted in controler_action.cpp lines 285

            execution.cancel();
            controller_active = false;
            fillExePathResult(mbf_msgs::ExePathResult::OSCILLATION, "Oscillation detected!", result);
            goal_handle.setAborted(result, result.message);

but the in_use flag is still true, so we try to update the current goal controller_action.cpp lines 90

        mbf_msgs::ExePathResult result;
        fillExePathResult(mbf_msgs::ExePathResult::CANCELED, "Goal preempted by a new plan", result);
        concurrency_slots_[slot].goal_handle.setCanceled(result, result.message);

blocking the cancel call is only a "approximate" solution, since the what really matters is the in_use flag (set in abstract_action_base.hpp, line 144). I've created a branch which might help: https://github.com/magazino/move_base_flex/tree/dima/fix_exec_path Would be nice if you can try it out

corot commented 4 years ago

blocking the cancel call is only a "approximate" solution, since the what really matters is the in_use flag (set in abstract_action_base.hpp, line 144). I've created a branch which might help: https://github.com/magazino/move_base_flex/tree/dima/fix_exec_path Would be nice if you can try it out

Seems to me that you are catching a different issue... So your problem is that, after we stop the controller due to oscillation detected, the goal gets preempted by a new path, even if it was already aborted?

But the in_use flag is set to false after cancel (see https://github.com/magazino/move_base_flex/blob/master/mbf_abstract_nav/include/mbf_abstract_nav/abstract_action_base.hpp#L144). Can you make a script to reproduce the problem? And what are the effects? crash? warns? errors?

corot commented 4 years ago

Hey @Timple, did you have time to tinker with this fix?

Timple commented 4 years ago

Thanks for the ping. I bumped it upward on my todo list.

I just realised I was mis-interpreting this PR. We're actually blocking the controller_->cancel() call and expect the control cycles to continue.

But the controller_->cancel() call is unchanged in the PR. So our implementation is of no concern I guess.

We have seen the To transition to a cancelled state, ... and this PR, from the looks of it, does seem to tackle this. So I'm happy to monitor any occurrences of this issue again after merging.

corot commented 4 years ago

Thanks for the ping. I bumped it upward on my todo list.

I just realised I was mis-interpreting this PR. We're actually blocking the controller_->cancel() call and expect the control cycles to continue.

But the controller_->cancel() call is unchanged in the PR. So our implementation is of no concern I guess.

Right; only effect you will notice is that execution cancel can take up to 500 milliseconds in addition to the time your controller takes to stop. But if you are handling the cancellation properly, it won't wait more than now.

We have seen the To transition to a cancelled state, ... and this PR, from the looks of it, does seem to tackle this. So I'm happy to monitor any occurrences of this issue again after merging.

If you get the error because of calling exe_path again right after a failure, yes, this should fix it. Great if you can give it a try before merging!

Timple commented 4 years ago

We do handle the cancel correctly, so no problems there.

Your branch works like a charm in all our unit tests where we use mbf :+1: I won't be able to reproduce the issue it solves. It's been a while since I've seen it.

Timple commented 4 years ago

use the following script to reproduce:

Would be nice to add this as a testcase :slightly_smiling_face: