ros-navigation / navigation2

ROS 2 Navigation Framework and System
https://nav2.org/
Other
2.51k stars 1.27k forks source link

follow_path action keeps on running after having canceled the goal #4659

Open milidam opened 1 month ago

milidam commented 1 month ago

Bug report

Required Info:

Steps to reproduce issue

In some cases, the follow_path action keeps on running after having canceled a navigate_to_pose goal.

Expected behavior

After canceling the navigate_to_pose goal, we would expect the controller_server to properly interrupt its follow_path action.

[controller_server-8] [INFO]: Goal was canceled. Stopping the robot.
[controller_server-8] [WARN]: [follow_path] [ActionServer] Client requested to cancel the goal. Cancelling.

Actual behavior

The bt_navigator gets a Failed to get result for compute_path_to_pose in node halt! or a Failed to get result for follow_path in node halt! error, and the controller_server keeps on trying to follow the last received path.

[bt_navigator-12] [INFO]: Starting path following
[controller_server-8] [INFO]: Received a goal, begin computing control effort.
[planner_server-9] [WARN]: Map update loop missed its desired rate of 5.0000Hz. Current loop rate is 0.9686Hz
[bt_navigator-12] [WARN]: Behavior tree engine cycle is slower than expected: 114 ms (timeout is 50 ms)
[bt_navigator-12] [WARN]: Time required to tick the tree is 114 ms
[bt_navigator-12] [WARN]: Timed out while waiting for action server to acknowledge goal request for follow_path
[bt_navigator-12] [INFO]: Pausing path following due to slow path computation (>500ms)
[planner_server-9] [INFO]: Attempting to find a path from (-12.62, 3.38) to (-54.58, 11.29).
[planner_server-9] [INFO]: SmacPlannerHybrid: Start pose in world: (-12.624103, 3.381263), in costmap (452, 103)
[planner_server-9] [INFO]: SmacPlannerHybrid: Goal pose in world: (-54.581326, 11.286437), in costmap (32, 182)
[app-21] [INFO]: Request to cancel goals on /navigate_to_pose was accepted by server, waiting for result
[controller_server-8] [WARN]: Control loop missed its desired rate of 10.0000 Hz. Current loop rate is 9.2498 Hz.
[bt_navigator-12] [ERROR]: Failed to get result for compute_path_to_pose in node halt!
[bt_navigator-12] [INFO]: Goal canceled
[bt_navigator-12] [WARN]: [navigate_to_pose] [ActionServer] Client requested to cancel the goal. Cancelling.
[controller_server-8] [WARN]: Control loop missed its desired rate of 10.0000 Hz. Current loop rate is 9.5395 Hz.
[controller_server-8] [WARN]: Control loop missed its desired rate of 10.0000 Hz. Current loop rate is 10.9403 Hz.

Additional information

In the case reported above, before that new instance of the Starting path following, in a previous attempt in the same instance of the navigate_to_pose behavior tree, the bt_navigator already encountered the following error

[bt_navigator-12] [ERROR]: Failed to get result for follow_path in node halt!

what could indicate that the bt_navigator might somehow be desynchronized with the controller_server?

Also, the map is quite big, and the planner_server takes some time to compute a path. The goal canceling occurred while the compute_path_to_pose action was still running.

SteveMacenski commented 1 month ago

That occurs here [1], so the server timeout doesn't cancel the action in time. If you were to set this to say - 1 second - does that remove your issue? Also, what so you have your server_timeout set to - that could be unrealistically short.

It looks like this condition was added in [2] and may need to (generally speaking) have a longer timeout than the server timeout proportional to steady-state BT node ticking. If we're halting / exiting out a BT, we may find it better to have the rate exceeded a bit in exchange for a more realistic chance of those actions being canceled successfully.

Also, the map is quite big, and the planner_server takes some time to compute a path. The goal canceling occurred while the compute_path_to_pose action was still running.

This is definitely why. If you have a server timeout of 10ms and you've just started a planning request that is 100ms in length, then it won't always cancel in time. We added a cancel checker [3] in the Planner API so that we query in the planning loop if a cancel is requested to be more responsive, but that is not backported to Iron and you'll need to upgrade to Jazzy to receive that. That actually by itself might completely fix your issue.

[1] https://github.com/ros-navigation/navigation2/blob/34d41ed32745e3978a0c0e30e9795c32ea184e8a/nav2_behavior_tree/include/nav2_behavior_tree/bt_action_node.hpp#L319

[2] https://github.com/ros-navigation/navigation2/commit/bfb4506c9b6beaebfb8532b221b517a18e40ac3d

[3] https://github.com/ros-navigation/navigation2/blob/main/nav2_navfn_planner/src/navfn_planner.cpp#L137

SteveMacenski commented 1 month ago

@milidam have you had a chance to look into these items? I'd love to get feedback on the precise nature of it to write up a solution for you to test

milidam commented 1 month ago

Hi @SteveMacenski, Our server_timeout is currently set to 100ms. I haven't had time yet to try to reproduce the issue with an increased timeout.

Our planner's max_planning_time is 30.0s. Does that also increase the risk of not being able to actually cancel it?

Note that migration to Jazzy is planned, but not yet started.

SteveMacenski commented 4 weeks ago

I haven't had time yet to try to reproduce the issue with an increased timeout.

Do you know when you would be able to? If you increased that to 500ms or 1s and it went away for Follow Path, that could be something we could replace that spin with since that cancel should only happen when halting an entire tree on task exit (which keeping the 10ms tick rate is unnecessary)

Does that also increase the risk of not being able to actually cancel it?

If you haven't upgraded, I think so. This is why I want you to check for me :-) But that shouldn't impact the follow path action, just computing path to pose -- since canceling path planning that is mid-execution is only supported in Jazzy and newer.