Closed pfedom closed 4 years ago
Please reformat your ticket and provide specific reproduction instructions in the TB3 default world (and any parameter changes from default you had to make to have this occur).
With turtlebot its all still default. On the real robot, the major changes should be in the DWB Controller the velocities and accelerations. Furthermore the robot has a rectengular footprint and not a radius like the turtlebot. Like mentioned in the edit: The turtlebot slows down but keeps moving and than hits the wall (see the gif I added) I think setting a bigger inflation radius in my real setup should slow down the process because the robot slows down earlier. Do I understand this right? But is it the expected behavior that the path gets planned through a wall?
I don't see any attempt at planning into that space from your gif, I just see a robot driving into a wall which is inside of the minimim range of the laser. It wouldn't surprise me if the laser gazebo plugin doesn't properly handle collision objects there resulting in that little nub in the map. But I don't see any action items here.
I can't reproduce any replanning outside of valid space in the default TB3 world without doing really unreasonable things trying to get it to do so like spawning inside walls or putting it in an impossible position.
This is super strange, I was just testing on my branch working on Hybrid-A* and I don't see this at all, but current main branch does, thanks for bringing this up. So something's changed letting this happen from after Jun 17.
It looks like the bug was introduced in e850db02ba88e03e2b6519e2b2ad51bbdc9f97f6 after the fix in RecoveryNode
logic. I reverted this patch and everything seems to work as expected but that only happens because RecoveryNode
doesn't retry ComputePathToPose
again after it fails the first time and the planner specific recovery is called (which is what this patch was supposed to fix).
Even after reverting e850db02ba88e03e2b6519e2b2ad51bbdc9f97f6, this bug can still be reproduced by increasing the number of retries in the planning branch to anything > 1:
<RecoveryNode number_of_retries="2" name="ComputePathToPose">
<ComputePathToPose goal="{goal}" path="{path}" planner_id="GridBased"/>
<ClearEntireCostmap name="ClearGlobalCostmap-Context" service_name="global_costmap/clear_entirely_global_costmap"/>
</RecoveryNode>
So I figured out the actual reason behind this bug.
Currently, when you give a goal when there's no valid path to the goal the first ComputePathToPose
call will fail since the planner is not able to plan a valid path. The RecoveryNode
then calls ClearEntireCostmap
and clears the global costmap. In the very next iteration within a fraction of a second RecoveryNode
retries ComputePathToPose
and this time the planner plans on an empty global costmap since the global costmap was cleared and hasn't spawned back up yet. The planner is able to find a straight path to the invalid goal, ComputePathToPose
succeeds and we have the bug!!
I added a 1 second sleep just above this line to test this and it checks out, everything works as expected: https://github.com/ros-planning/navigation2/blob/506384f2bcf32978428278f65567bfaf429d6d63/nav2_planner/src/planner_server.cpp#L238
Where was this originally introduced? Was it from a BT update to include clearing costmaps as a failure action in planning? I don't see how the commit hash you linked to could cause that issue (since N has been > 1 I believe always).
I see the server_timeout for clear costmaps is 10ms https://github.com/ros-planning/navigation2/blob/f237751f557b2e921e08f96d4749bfe3f2b2b091/nav2_bt_navigator/src/bt_navigator.cpp#L126 and looking at the service caller https://github.com/ros-planning/navigation2/blob/main/nav2_behavior_tree/include/nav2_behavior_tree/bt_service_node.hpp#L102 we shouldn't be returning from the service node until after its completed the clear costmaps action.
So I'd be curious what the return code from check_future
is. If its timeout, we just need to increase that timeout, if its success, then there's something wrong in costmap_2d because we should not be returning from the service until things are good again. It wouldn't surprise me at all if this was actually another DDS issue and services were failing again and its manifesting for us in this way because its timing out mid-execution.
If that was the case, that sleep you added would have the same effect because the service call was received by costmap but not finished, so then when the planner is planning, the clearing is still underway causing that overlap. If that is indeed what's happening (not that the timeout isn't big enough, but service calls are failing), then we really need a better way to identify that situation or get around it if this will be an issue that comes up over and over again.
But its not clear to me where this got introduced.
Where was this originally introduced?
https://github.com/ros-planning/navigation2/pull/1823 fixed a recovery node error because of which the main child of the recovery node was not retried after executing the second child. Before this patch, the flow of the recovery node was: first child (fails) -> second child (succeeds) -> recovery node (succeeds)
After this path, the flow of the recovery node was changed to: first child (fails) -> second child (succeeds) -> first child (succeeds) -> recovery node (succeeds)
I don't see how the commit hash you linked to could cause that issue (since N has been > 1 I believe always).
The number of retries for the planner specific recovery node has always been 1 because of which this issue was masked before https://github.com/ros-planning/navigation2/pull/1823 https://github.com/ros-planning/navigation2/blob/a30816f2939e441f2054df0f40775adb76b57997/nav2_bt_navigator/behavior_trees/navigate_w_replanning_and_recovery.xml#L11
I don't think the clear costmap service returns a timeout, the service executes correctly and ends up clearing the costmap. I think this issue is caused because of the time it takes to rebuild the costmap after it has been cleared. The planner server calls getPlan()
right after the clear costmap service executes which makes me think that the planner actually ends up planning on the cleared costmap before the sensor data is received and the costmap is rebuilt again.
The timeout I added just before getPlan()
is called gives the costmap enough time to receive sensor data and rebuild its layers after which the planner is able to plan correctly. This should confirm that the planner ends up planning on an empty costmap if it's called right after the clear costmap service executes and returns success.
After this path, the flow of the recovery node was changed to: first child (fails) -> second child (succeeds) -> first child (succeeds) -> recovery node (succeeds)
For the recovery node, I don't think that's good behavior, I think the original was correct. If the planner is now passing, we shouldn't be triggering the clear costmap again. Imagine this recovery was backing up or something, we'd be backing up, in a good state, and then backing up again.
The timeout I added just before getPlan() is called gives the costmap enough time to receive sensor data and rebuild its layers after which the planner is able to plan correctly. This should confirm that the planner ends up planning on an empty costmap if it's called right after the clear costmap service executes and returns success.
I would just make sure its not timing out, because that's something we can more easily do something about. If it really just needs time, that becomes more complicated to deal with. We'd need to keep flags between clears and updates so that we can wait until the costmap is valid again to use.
If the planner is now passing, we shouldn't be triggering the clear costmap again. Imagine this recovery was backing up or something, we'd be backing up, in a good state, and then backing up again.
I think you misinterpreted my comment. The RecoveryNode
is the parent node which triggers its second child when the first child fails. The second child is assumed to be the recovery behavior that you want to execute when your main execution branch (RecoveryNode
's first child) fails. Taking the planning branch as an example:
<RecoveryNode number_of_retries="1" name="ComputePathToPose">
<ComputePathToPose goal="{goal}" path="{path}" planner_id="GridBased"/>
<ClearEntireCostmap name="ClearGlobalCostmap-Context" service_name="global_costmap/clear_entirely_global_costmap"/>
</RecoveryNode>
Before #1823, the RecoveryNode
would have returned success right after ClearEntireCostmap
returns success without re-ticking ComputePathToPose
. The flow in that case would be:
ComputePathToPose (fails) -> ClearEntireCostmap (succeeds)
At this point, RecoveryNode
just returns success without going back to ComputePathToPose
. This is also why the planning through walls issue never showed up before #1823.
After #1823, the flow becomes:
ComputePathToPose (fails) -> ClearEntireCostmap (succeeds) -> ComputePathToPose (succeeds)
The first child of RecoveryNode
is executed again after a successful recovery (second child) executes. At this point RecoveryNode
returns success. This should be the expected behavior right?
I would just make sure its not timing out, because that's something we can more easily do something about.
I checked the return code from the ClearEntireCostmap
service, it returns success. 100% sure now that it's the planner planning on an empty costmap issue.
This should be the expected behavior right?
That does sound right I misread recovery node (succeeds)
part - I thought that was meaning the recovery behavior action. God, the nomenclature we used in this is going to be messing with my head for years to come, isn't it?
I checked the return code from the ClearEntireCostmap service, it returns success. 100% sure now that it's the planner planning on an empty costmap issue.
Ok.
My knee-jerk solution would be to add a variable to Costmap2DROS
about whether it was cleared, and if so, it needs to wait for a new update before planning. Or in the clear costmap services themselves, the last step is an update
for the costmap so that by the time it returns, it already did a sync.
@pfedom @naiveHobo please review / test this patch in https://github.com/ros-planning/navigation2/pull/2044
@SteveMacenski I'm getting similar behavior at regular intervals. I thought that this patch could fix that, but unfortunately, it doesn't help at all. The main difference is that I'm not using amcl, but the map and localization is provided by SLAM at 1 Hz. The rest of the configuration is the same. By any chance, these are related? You can see path planning failures in attached screenshots, sometimes it just goes straight through obstacles. I could provide step by step reproduction here if that's relevant or should I fill in a new issue?
I'd very much like to merge a solution for this this week, will be more actively dealing with #2009 and #2044
@simutisernestas please use our default configs of things to test against these patches. We don't need to introduce any additional changes to fix unrelated bugs
https://github.com/ros-planning/navigation2/pull/2090 merging imminently to resolve
Is there a solution for Foxy? PRs here cannot be applied to foxy-devel
branch.
Bug report
Required Info:
Steps to reproduce issue
Set goal pose to a position the robot can not reach. For example close the door after the robot entered the room and than set the goal pose outside the room. Edit: To reproduce it with TB3 I started my turtlebot3_world with:
ros2 launch turtlebot3_gazebo turtlebot3_world.launch.py
. Than start slam_toolbox async mode:ros2 launch slam_toolbox online_async_launch.py
. Start nav2 navigation:ros2 launch nav2_bringup navigation_launch.py
. Start rviz2, drive the turtlebot around to get a full map. Stop next to any wall. Set the Goal Pose outside of the "Arena" and it slowly turns into the wall until it hits it. All parameters were set default and I used the apt install not the source for the prove with turtlebot. The turtlebot is much slower hitting the wall than the real robot. The real robot hits it pretty fast. I think because it doesnt slow down fast enough. Setting the inflation radius higher should help, so at least it slows down earlier shouldnt it?End of edit.
Expected behavior
The robot searches for a path to get to the point and stops with an error message when it can't reach the goal.
Actual behavior
The pathplanner plans a way through a wall. The robot drives to the wall, slows down a bit but keeps going and hits the wall.
Additional information
I use an omnidirectional robot platform with one laserscanner in front (180 degrees) and the Intel T265 camera as odometry signal. While this test slam_toolbox was running in online_async mode. I encountered the same behavior when running nav2 without slam and prerecorded map.
Is there a parameter or something else that needs to be set so the wall has maximum cost so there will never be a path planned through? Picture of the wrong planned path:
Here my nav2_params.yaml: amcl: rosparameters: use_sim_time: False alpha1: 0.2 alpha2: 0.2 alpha3: 0.25 alpha4: 0.2 alpha5: 0.2 base_frame_id: "base_link" beam_skip_distance: 0.5 beam_skip_error_threshold: 0.9 beam_skip_threshold: 0.3 do_beamskip: false global_frame_id: "map" lambda_short: 0.1 laser_likelihood_max_dist: 2.0 laser_max_range: 25.0 laser_min_range: 0.05 laser_model_type: "likelihood_field" max_beams: 60 max_particles: 2000 min_particles: 500 odom_frame_id: "odom" pf_err: 0.05 pf_z: 0.99 recovery_alpha_fast: 0.0 recovery_alpha_slow: 0.0 resample_interval: 1 robot_model_type: "omnidirectional" save_pose_rate: 0.5 sigma_hit: 0.2 tf_broadcast: true transform_tolerance: 1.0 update_min_a: 0.2 update_min_d: 0.25 z_hit: 0.9 z_max: 0.05 z_rand: 0.5 z_short: 0.05 scan_topic: scan amcl_map_client: rosparameters: use_sim_time: False amcl_rclcpp_node: ros__parameters: use_sim_time: False bt_navigator: ros__parameters: use_sim_time: False global_frame: map robot_base_frame: base_link odom_topic: /odom default_bt_xml_filename: "navigate_w_replanning_and_recovery.xml" plugin_lib_names:
Progress checker parameters
progress_checker: plugin: "nav2_controller::SimpleProgressChecker" required_movement_radius: 0.5 movement_time_allowance: 10.0
Goal checker parameters
goal_checker: plugin: "nav2_controller::SimpleGoalChecker" xy_goal_tolerance: 0.25 yaw_goal_tolerance: 0.25 stateful: True
DWB parameters
FollowPath: plugin: "dwb_core::DWBLocalPlanner" debug_trajectory_details: True min_vel_x: -0.2 min_vel_y: -0.2 max_vel_x: 0.5 max_vel_y: 0.3 max_vel_theta: 1.0 min_speed_xy: -0.1 max_speed_xy: 0.1 min_speed_theta: -1.0
Add high threshold velocity for turtlebot 3 issue.
https://github.com/ROBOTIS-GIT/turtlebot3_simulations/issues/75
acc_lim_x: 0.3 acc_lim_y: 0.3 acc_lim_theta: 1.0 decel_lim_x: -0.3 decel_lim_y: -0.3 decel_lim_theta: -1.0 vx_samples: 20 vy_samples: 15 vtheta_samples: 20 sim_time: 2.0 linear_granularity: 0.05 angular_granularity: 0.025 transform_tolerance: 0.2 xy_goal_tolerance: 0.25 trans_stopped_velocity: 0.25 short_circuit_trajectory_evaluation: True stateful: True critics: ["RotateToGoal", "Oscillation", "BaseObstacle", "GoalAlign", "PathAlign", "PathDist", "GoalDist"] BaseObstacle.scale: 0.02 PathAlign.scale: 32.0 PathAlign.forward_point_distance: 0.1 GoalAlign.scale: 24.0 GoalAlign.forward_point_distance: 0.1 PathDist.scale: 32.0 GoalDist.scale: 24.0 RotateToGoal.scale: 32.0 RotateToGoal.slowing_factor: 5.0 RotateToGoal.lookahead_time: -1.0 controller_server_rclcpp_node: rosparameters: use_sim_time: False local_costmap: local_costmap: rosparameters: update_frequency: 5.0 publish_frequency: 2.0 global_frame: odom robot_base_frame: base_link use_sim_time: False rolling_window: true width: 3 height: 3 resolution: 0.05
robot_radius: 0.22
footprint: "[ [0.45, 0.3], [-0.4, 0.3], [-0.4, -0.3], [0.45, -0.3] ]" plugins: ["obstacle_layer", "inflation_layer"] obstacle_layer: plugin: "nav2_costmap_2d::ObstacleLayer" enabled: True observation_sources: scan footprint_clearing_enabled: true max_obstacle_height: 2.0 combination_method: 1 scan: topic: /scan obstacle_range: 2.5 raytrace_range: 3.0 max_obstacle_height: 2.0 min_obstacle_height: 0.0 clearing: True marking: True data_type: "LaserScan" inf_is_valid: false inflation_layer: plugin: "nav2_costmap_2d::InflationLayer" cost_scaling_factor: 3.0 static_layer: map_subscribe_transient_local: True always_send_full_costmap: True local_costmap_client: ros__parameters: use_sim_time: False local_costmap_rclcpp_node: rosparameters: use_sim_time: False global_costmap: global_costmap: rosparameters: update_frequency: 1.0 publish_frequency: 1.0 global_frame: map robot_base_frame: base_link use_sim_time: False
robot_radius: 0.22
footprint: "[ [0.45, 0.3], [-0.4, 0.3], [-0.4, -0.3], [0.45, -0.3] ]" resolution: 0.05 plugins: ["static_layer", "obstacle_layer", "inflation_layer"] obstacle_layer: plugin: "nav2_costmap_2d::ObstacleLayer" enabled: True observation_sources: scan scan: topic: /scan max_obstacle_height: 2.0 clearing: True marking: True data_type: "LaserScan" static_layer: plugin: "nav2_costmap_2d::StaticLayer" map_subscribe_transient_local: True inflation_layer: plugin: "nav2_costmap_2d::InflationLayer" always_send_full_costmap: True global_costmap_client: rosparameters: use_sim_time: False global_costmap_rclcpp_node: ros__parameters: use_sim_time: False map_server: rosparameters: use_sim_time: False yaml_filename: "turtlebot3_world.yaml" map_saver: rosparameters: use_sim_time: False save_map_timeout: 5000 free_thresh_default: 0.25 occupied_thresh_default: 0.65 map_subscribe_transient_local: True planner_server: rosparameters: expected_planner_frequency: 20.0 use_sim_time: False planner_plugins: ["GridBased"] GridBased: plugin: "nav2_navfn_planner/NavfnPlanner" tolerance: 0.5 use_astar: true allow_unknown: true planner_server_rclcpp_node: ros__parameters: use_sim_time: False recoveries_server: rosparameters: costmap_topic: local_costmap/costmap_raw footprint_topic: local_costmap/published_footprint cycle_frequency: 10.0 recovery_plugins: ["spin", "backup", "wait"] spin: plugin: "nav2_recoveries/Spin" backup: plugin: "nav2_recoveries/BackUp" wait: plugin: "nav2_recoveries/Wait" global_frame: odom robot_base_frame: base_link transform_timeout: 0.1 use_sim_time: False simulate_ahead_time: 2.0 max_rotational_vel: 1.0 min_rotational_vel: 0.4 rotational_acc_lim: 3.2 robot_state_publisher: rosparameters: use_sim_time: False