magazino / move_base_flex

Move Base Flex: a backwards-compatible replacement for move_base
https://uos.github.io/mbf_docs/
BSD 3-Clause "New" or "Revised" License
429 stars 154 forks source link

[mbf_costmap_nav] get_path (SBPLLatticePlanner) plans through walls and obstacles. Same config with move_base does not. #208

Closed kajep09 closed 3 years ago

kajep09 commented 4 years ago

Hi @spuetz @corot and others who might be able to help.

I'm trying to use MBF on ROS Melodic with TEB as the local planner and SBPLLatticePlanner as the global planner (all 3 components installed the latest release debs), but I am experiencing some really strange behavior with SBPL that doesn't show up when using the same parameters with normal _movebase.

Before I go into too much detail, I think a video summarizes the problem better:

Scenario:

The expected behavior is the following:

This behavior is shown by a normal _movebase setup.
trapped robot move_base - the robot simply fails all replan attempts and aborts

However, in MBF the behavior goes like this:

trapped robot move_base_flex - erroneous replan at around 1:00

This behavior is with the exact same parameter setup between MB and MBF (except for MBF specific ones). I have quite an extensive parameter setup spread out over multiple files, but I have summarized what I believe are the most relevant ones below:

costmap parameters (cherrypicked for brevity)

planners:
  - name: 'SBPLLatticePlanner'
    type: 'SBPLLatticePlanner'

SBPLLatticePlanner:
  environment_type: XYThetaLattice
  planner_type: ARAPlanner
  # planner_type: ADPlanner
  allocated_time: 10.0 #30.0 #10
  initial_epsilon: 3.0 #50.0 #3.0
  forward_search: false 
  nominalvel_mpersecs: 0.5 
  timetoturn45degsinplace_secs: 3.0 
  lethal_obstacle: 20 #250

###
#namespaced into global_costmap and local_costmap through the launchfile
robot_base_frame: base_link 
robot_frame: base_link 
map_frame: map
global_frame: map
footprint: [[-0.1,-0.25],[-0.1,0.25],[0.7,0.25],[0.7,-0.25]]

navigation_map:
  enabled: true
###

local_costmap:
  plugins: 
  - {name: navigation_map, type: "costmap_2d::StaticLayer" }
  # - {name: virtual_walls_map, type: "costmap_2d::StaticLayer" }
  - {name: obsl,  type: "costmap_2d::VoxelLayer" }
  - {name: npvl, type: "costmap_2d::NonPersistentVoxelLayer"}
  - {name: stvl,  type: "spatio_temporal_voxel_layer/SpatioTemporalVoxelLayer" }
  - {name: inflation,  type: "costmap_2d::InflationLayer" }

  global_frame: odom
  static_map: false
  update_frequency: 5.0
  publish_frequency: 5.0
  resolution: 0.05
  rolling_window: true
  footprint_padding: 0.01 #0.2
  width: 5.0
  height: 5.0
  origin_x: 0.0
  origin_y: 0.0

  inflation:
    cost_scaling_factor: 2.0 
    inflation_radius:     1.5

global_costmap:
  plugins: 
  - {name: navigation_map, type: "costmap_2d::StaticLayer" }
  # - {name: virtual_walls_map, type: "costmap_2d::StaticLayer" }
  - {name: obsl,  type: "costmap_2d::VoxelLayer" }
  - {name: npvl, type: "costmap_2d::NonPersistentVoxelLayer"}
  - {name: stvl,  type: "spatio_temporal_voxel_layer/SpatioTemporalVoxelLayer" }
  - {name: inflation,  type: "costmap_2d::InflationLayer" }

  global_frame: map
  static_map: true
  update_frequency: 1.0 #1.0
  publish_frequency: 1.0
  resolution: 0.05
  rolling_window: false
  footprint_padding: 0.01 #0.2

  inflation:
    cost_scaling_factor:  2.0
    inflation_radius:     1.5

MBF specific parameters

        <param name="force_stop_on_cancel" value="True"/>
        <param name="planner_frequency" value="0.0"/>
        <param name="planner_patience" value="100.0"/>
        <param name="planner_max_retries" value="-1"/>
        <param name="controller_frequency" value="10.0"/>
        <param name="controller_patience" value="1.0"/>
        <param name="controller_max_retries" value="10"/>
        <param name="recovery_enabled" value="True"/>
        <param name="recovery_patience" value="15.0"/>
        <param name="oscillation_timeout" value="10.0"/>
        <param name="oscillation_distance" value="0.5"/>

I am using the move_base_legacy_relay script in this testing and haven't yet tried to see if the issue persist if I use get_pathand exe_path directly, but under the hood this is already happening as far as I understand? I find it strange that the initial plan generated by the global planner obeys the obstacles just fine and does not shortcut through walls even though it would be faster - it only seems to happen after it has tried for a really long time to generate a new plan and failed many many times. Could it be something that is not cleared/reset correctly when get_path fails so many times (or internally in SBPLLP?) or a race-condition where something happens to the costmap that makes the planner ignore it for some reason?

I really hope you have some good pointers to how I can fix this issue and if any of you (I'm assuming @corot due to your involvement on the SBPL repo where we have discussed issues) have a configuration with SBPLLatticePlanner that works for you I would be very happy to try that out on my machine to figure out what is going wrong. I would also like to get an opinion from @mintar about whether or not it could have anything to do with SBPLLatticePlanner, only MBF, or the combination of the two.

Thank you all for awesome software and hope you can help me utilize it better!

P.S.

When I terminate MBF I get the following warnings, but I am unsure if they are related at all.

^C[move_base_legacy_relay-2] killing on exit
[move_base_flex-1] killing on exit
[ INFO] [1594657112.079730711, 609.352000000]: Shutdown costmap navigation server.
[ INFO] [1594657112.079789120, 609.352000000]: Cancel all goals for "get_path".
[ WARN] [1594657112.079809324, 609.352000000]: Cancel planning failed or is not supported by the plugin. Wait until the current planning finished!
[ INFO] [1594657112.079833577, 609.352000000]: Cancel all goals for "exe_path".
[ WARN] [1594657112.079848142, 609.352000000]: Cancel controlling failed. Wait until the current control cycle finished!
[ INFO] [1594657112.143313148, 609.418000000]: Action "exe_path" canceled
[ INFO] [1594657112.143426862, 609.418000000]: Cancel all goals for "recovery".
[ WARN] [1594657112.143483347, 609.418000000]: Cancel recovery behavior "clear_costmap3" failed or is not supported by the plugin. Wait until the current recovery behavior finished!
[ INFO] [1594657112.143545947, 609.418000000]: Stopping local and global costmap for shutdown
[ INFO] [1594657112.143602076, 609.418000000]: local_costmap/stvl was deactivated.
[ INFO] [1594657112.143651202, 609.418000000]: global_costmap/stvl was deactivated.
Warning: class_loader.ClassLoader: SEVERE WARNING!!! Attempting to unload library while objects created by this loader exist in the heap! You should delete your objects before attempting to unload the library or destroying the ClassLoader. The library will NOT be unloaded.
         at line 122 in /tmp/binarydeb/ros-melodic-class-loader-0.4.1/src/class_loader.cpp
Warning: class_loader.ClassLoader: SEVERE WARNING!!! Attempting to unload library while objects created by this loader exist in the heap! You should delete your objects before attempting to unload the library or destroying the ClassLoader. The library will NOT be unloaded.
         at line 122 in /tmp/binarydeb/ros-melodic-class-loader-0.4.1/src/class_loader.cpp
Warning: class_loader.ClassLoader: SEVERE WARNING!!! Attempting to unload library while objects created by this loader exist in the heap! You should delete your objects before attempting to unload the library or destroying the ClassLoader. The library will NOT be unloaded.
         at line 122 in /tmp/binarydeb/ros-melodic-class-loader-0.4.1/src/class_loader.cpp
shutting down processing monitor...
... shutting down processing monitor complete
done
corot commented 4 years ago

Hi @kajep09, and thanks for your extensive issue report. I wish all where that detailed! I bet that your problem comes from not locking the global costmap. Try setting param planner_lock_costmap to false. You can see in costmap_planner_execution.cpp#L80 that the costmaps are not locked by default when calling the planner. This is a "feature" of MBF that we should have documented better, so I apologize for not having done so. The idea is that the planner himself locks the costmap ONLY when needed, and so blocks it the less possible time. With SBPL this is important because it's very slow, and locking the costmap for all the planning time, as in old-good move_base, delays a lot any update. Setting planner_lock_costmap to false will make MBF behave as move_base. If that works, I suggest you to modify SBPL to lock the map only while it gets copied (what is very fast), and possibly PR the change (enabled by a parameter set as false by default, as move_base will still lock the costmap).

kajep09 commented 4 years ago

Hi @corot

Thanks for the quick response, but unfortunately I cannot detect any change in behavior between setting that parameter true or false. By the way - am I misreading your comment on the code when you say you are not locking the costmaps by default? If the parameter is not set it defaults to true, and in the check in line 80 you get the mutex before calling the planner? So by default it locks right?

corot commented 4 years ago

mmm... could be that clear costmaps recovery cleans the global costmap, and it doesn't get "repainted" fast enough. I added a parameter to the clear costmaps recovery behavior to prevent this (PR). Try to add force_updating equal true to the gloabal_costmap configuration. To verify if that's the reasion, you can also try to remove all calls to clear costmaps, by removing them from the loaded recovery behaviors list.

kajep09 commented 4 years ago

I had the same thought, but I've already tried a few variations to test that and I don't think that is (directly) the issue. If I'm using clear_costmaprecovery I've always had force_updating: true, and also tried with reset_distance: 0 and no valid layer_names. The reason I have multiple duplicate recoveries was to make sure that MBF didn't abort prematurely after exhausting just 1 recovery behavior, because the weird planner behavior doesn't always show up right away.

recovery_behaviors:
  - name: 'clear_costmap1'
    type: 'clear_costmap_recovery/ClearCostmapRecovery'
  - name: 'clear_costmap2'
    type: 'clear_costmap_recovery/ClearCostmapRecovery'
  - name: 'clear_costmap3'
    type: 'clear_costmap_recovery/ClearCostmapRecovery'
  - name: 'clear_costmap4'
    type: 'clear_costmap_recovery/ClearCostmapRecovery'
  - name: 'clear_costmap5'
    type: 'clear_costmap_recovery/ClearCostmapRecovery'
  - name: 'clear_costmap6'
    type: 'clear_costmap_recovery/ClearCostmapRecovery'

clear_costmap1:
  reset_distance: 0.0
  force_updating: true
  affected_maps: both
  layer_names: [stvl, npvl, obsl]
#
#.
#.
#.
#
clear_costmap6:
  reset_distance: 0.0 #6.0
  force_updating: true
  affected_maps: both
  layer_names: [stvl, npvl, obsl]

To make sure that it wasn't the clear costmap recoveries I tried to similarly have a list of just rotate recoveries, and the same planner behavior happens. So I don't think it's the clear costmap, but I still assume that overall it has to do with the costmaps because otherwise why would the planner disregard ALL layers of the costmap, even the static layer which was never part of the clear_costmap recovery in the first place.

recovery_behaviors:
  - name: 'rotate_recovery1'
    type: 'rotate_recovery/RotateRecovery'
  - name: 'rotate_recovery2'
    type: 'rotate_recovery/RotateRecovery'
  - name: 'rotate_recovery3'
    type: 'rotate_recovery/RotateRecovery'
  - name: 'rotate_recovery4'
    type: 'rotate_recovery/RotateRecovery'
  - name: 'rotate_recovery5'
    type: 'rotate_recovery/RotateRecovery'
  - name: 'rotate_recovery6'
    type: 'rotate_recovery/RotateRecovery'
kajep09 commented 4 years ago

Hi again @corot

I just tried something that surprised me a bit. I switched to GlobalPlanner instead of SBPL, but kept every other parameter the same and I am seeing the same issue. I didn't expect this, but I was previously testing out GlobalPlanner with replanning active and not this oneshot -> fail -> replan setup that I was trying with SBPL.

So this leads me to one of two temporary conclusions:

Just wanted to let you know so you don't go chasing down the SBPL route. Thanks for your help so far and hopefully we can find the issue together :)

I will see if I can't make a minimal replicable example so we can see the full parameter list and whether something is wrong.

Current planner setup that both can show the issue

planners:
  # - name: 'SBPLLatticePlanner'
  #   type: 'SBPLLatticePlanner'
  - name: 'GlobalPlanner'
    type: 'global_planner/GlobalPlanner'

GlobalPlanner:
  allow_unknown: false
  default_tolerance: 0.0
  visualize_potential: true
  use_dijkstra: true
  use_quadratic: false
  use_grid_path: false
  old_navfn_behaviour: false
  lethal_cost: 253
  neutral_cost: 66
  cost_factor: 0.55

# SBPLLatticePlanner:
#   environment_type: XYThetaLattice
#   planner_type: ARAPlanner
#   # planner_type: ADPlanner
#   allocated_time: 10.0 #30.0 #10
#   initial_epsilon: 15.0 #50.0 #3.0
#   forward_search: false 
#   nominalvel_mpersecs: 0.5 #0.8
#   timetoturn45degsinplace_secs: 3.0 #1.31 # = 0.6 rad/s
#   # lethal_obstacle: 20 #250
corot commented 4 years ago

I think the problem doesn't happen on move_base cause it runs in a loop and waits a bit after every operation (e.g. clearing the costmaps): https://github.com/ros-planning/navigation/blob/melodic-devel/move_base/src/move_base.cpp#L779 But in MBF we call directly planning right after completing each recovery behavior (that's how I came with the need of the force_updating parameter for clear costmap). Things to try:

corot commented 3 years ago

Hey @kajep09, did you manage to fix this? Can I close the issue?

corot commented 3 years ago

Closing this as out of scope of MBF