[mbf_costmap_nav] get_path (SBPLLatticePlanner) plans through walls and obstacles. Same config with move_base does not.

Hi @spuetz @corot and others who might be able to help.

I'm trying to use MBF on ROS Melodic with TEB as the local planner and SBPLLatticePlanner as the global planner (all 3 components installed the latest release debs), but I am experiencing some really strange behavior with SBPL that doesn't show up when using the same parameters with normal _movebase.

Before I go into too much detail, I think a video summarizes the problem better:

Scenario:

The robot (gazebo simulation) is trapped in area where there are obstacles (not part of the static map) that trap it so it cannot get out, but it only detects this when it sees the added obstacles with the laser scanner. Replanning rate is 0 Hz.

The expected behavior is the following:

The robot can plan an initial global plan through the static map before seeing the obstacles
The initial global plan is aborted when the controller fails due to the obstacles.
The global planner tries to plan a new path, but fails repeatedly due to the obstacles blocking it
- The robot goes into recovery behaviors (if enabled) or alternatively aborts the whole thing (or keeps trying with infinite planning retries)

This behavior is shown by a normal _movebase setup.
trapped robot move_base - the robot simply fails all replan attempts and aborts

However, in MBF the behavior goes like this:

The robot can plan an initial global plan through the static map before seeing the obstacles
The initial global plan is aborted when the controller fails due to the obstacles.
The global planner tries to plan a new path, but fails repeatedly due to the obstacles blocking it - At some point the global planner loses its mind and plans directly through the obstacles (or even the static map)

trapped robot move_base_flex - erroneous replan at around 1:00

This behavior is with the exact same parameter setup between MB and MBF (except for MBF specific ones). I have quite an extensive parameter setup spread out over multiple files, but I have summarized what I believe are the most relevant ones below:

costmap parameters (cherrypicked for brevity)

planners:
  - name: 'SBPLLatticePlanner'
    type: 'SBPLLatticePlanner'

SBPLLatticePlanner:
  environment_type: XYThetaLattice
  planner_type: ARAPlanner
  # planner_type: ADPlanner
  allocated_time: 10.0 #30.0 #10
  initial_epsilon: 3.0 #50.0 #3.0
  forward_search: false 
  nominalvel_mpersecs: 0.5 
  timetoturn45degsinplace_secs: 3.0 
  lethal_obstacle: 20 #250

###
#namespaced into global_costmap and local_costmap through the launchfile
robot_base_frame: base_link 
robot_frame: base_link 
map_frame: map
global_frame: map
footprint: [[-0.1,-0.25],[-0.1,0.25],[0.7,0.25],[0.7,-0.25]]

navigation_map:
  enabled: true
###

local_costmap:
  plugins: 
  - {name: navigation_map, type: "costmap_2d::StaticLayer" }
  # - {name: virtual_walls_map, type: "costmap_2d::StaticLayer" }
  - {name: obsl,  type: "costmap_2d::VoxelLayer" }
  - {name: npvl, type: "costmap_2d::NonPersistentVoxelLayer"}
  - {name: stvl,  type: "spatio_temporal_voxel_layer/SpatioTemporalVoxelLayer" }
  - {name: inflation,  type: "costmap_2d::InflationLayer" }

  global_frame: odom
  static_map: false
  update_frequency: 5.0
  publish_frequency: 5.0
  resolution: 0.05
  rolling_window: true
  footprint_padding: 0.01 #0.2
  width: 5.0
  height: 5.0
  origin_x: 0.0
  origin_y: 0.0

  inflation:
    cost_scaling_factor: 2.0 
    inflation_radius:     1.5

global_costmap:
  plugins: 
  - {name: navigation_map, type: "costmap_2d::StaticLayer" }
  # - {name: virtual_walls_map, type: "costmap_2d::StaticLayer" }
  - {name: obsl,  type: "costmap_2d::VoxelLayer" }
  - {name: npvl, type: "costmap_2d::NonPersistentVoxelLayer"}
  - {name: stvl,  type: "spatio_temporal_voxel_layer/SpatioTemporalVoxelLayer" }
  - {name: inflation,  type: "costmap_2d::InflationLayer" }

  global_frame: map
  static_map: true
  update_frequency: 1.0 #1.0
  publish_frequency: 1.0
  resolution: 0.05
  rolling_window: false
  footprint_padding: 0.01 #0.2

  inflation:
    cost_scaling_factor:  2.0
    inflation_radius:     1.5

MBF specific parameters

        <param name="force_stop_on_cancel" value="True"/>
        <param name="planner_frequency" value="0.0"/>
        <param name="planner_patience" value="100.0"/>
        <param name="planner_max_retries" value="-1"/>
        <param name="controller_frequency" value="10.0"/>
        <param name="controller_patience" value="1.0"/>
        <param name="controller_max_retries" value="10"/>
        <param name="recovery_enabled" value="True"/>
        <param name="recovery_patience" value="15.0"/>
        <param name="oscillation_timeout" value="10.0"/>
        <param name="oscillation_distance" value="0.5"/>

I am using the move_base_legacy_relay script in this testing and haven't yet tried to see if the issue persist if I use get_pathand exe_path directly, but under the hood this is already happening as far as I understand? I find it strange that the initial plan generated by the global planner obeys the obstacles just fine and does not shortcut through walls even though it would be faster - it only seems to happen after it has tried for a really long time to generate a new plan and failed many many times. Could it be something that is not cleared/reset correctly when get_path fails so many times (or internally in SBPLLP?) or a race-condition where something happens to the costmap that makes the planner ignore it for some reason?

I really hope you have some good pointers to how I can fix this issue and if any of you (I'm assuming @corot due to your involvement on the SBPL repo where we have discussed issues) have a configuration with SBPLLatticePlanner that works for you I would be very happy to try that out on my machine to figure out what is going wrong. I would also like to get an opinion from @mintar about whether or not it could have anything to do with SBPLLatticePlanner, only MBF, or the combination of the two.

Thank you all for awesome software and hope you can help me utilize it better!

P.S.

When I terminate MBF I get the following warnings, but I am unsure if they are related at all.

^C[move_base_legacy_relay-2] killing on exit
[move_base_flex-1] killing on exit
[ INFO] [1594657112.079730711, 609.352000000]: Shutdown costmap navigation server.
[ INFO] [1594657112.079789120, 609.352000000]: Cancel all goals for "get_path".
[ WARN] [1594657112.079809324, 609.352000000]: Cancel planning failed or is not supported by the plugin. Wait until the current planning finished!
[ INFO] [1594657112.079833577, 609.352000000]: Cancel all goals for "exe_path".
[ WARN] [1594657112.079848142, 609.352000000]: Cancel controlling failed. Wait until the current control cycle finished!
[ INFO] [1594657112.143313148, 609.418000000]: Action "exe_path" canceled
[ INFO] [1594657112.143426862, 609.418000000]: Cancel all goals for "recovery".
[ WARN] [1594657112.143483347, 609.418000000]: Cancel recovery behavior "clear_costmap3" failed or is not supported by the plugin. Wait until the current recovery behavior finished!
[ INFO] [1594657112.143545947, 609.418000000]: Stopping local and global costmap for shutdown
[ INFO] [1594657112.143602076, 609.418000000]: local_costmap/stvl was deactivated.
[ INFO] [1594657112.143651202, 609.418000000]: global_costmap/stvl was deactivated.
Warning: class_loader.ClassLoader: SEVERE WARNING!!! Attempting to unload library while objects created by this loader exist in the heap! You should delete your objects before attempting to unload the library or destroying the ClassLoader. The library will NOT be unloaded.
         at line 122 in /tmp/binarydeb/ros-melodic-class-loader-0.4.1/src/class_loader.cpp
Warning: class_loader.ClassLoader: SEVERE WARNING!!! Attempting to unload library while objects created by this loader exist in the heap! You should delete your objects before attempting to unload the library or destroying the ClassLoader. The library will NOT be unloaded.
         at line 122 in /tmp/binarydeb/ros-melodic-class-loader-0.4.1/src/class_loader.cpp
Warning: class_loader.ClassLoader: SEVERE WARNING!!! Attempting to unload library while objects created by this loader exist in the heap! You should delete your objects before attempting to unload the library or destroying the ClassLoader. The library will NOT be unloaded.
         at line 122 in /tmp/binarydeb/ros-melodic-class-loader-0.4.1/src/class_loader.cpp
shutting down processing monitor...
... shutting down processing monitor complete
done

Hi @kajep09, and thanks for your extensive issue report. I wish all where that detailed! I bet that your problem comes from not locking the global costmap. Try setting param planner_lock_costmap to false. You can see in costmap_planner_execution.cpp#L80 that the costmaps are not locked by default when calling the planner. This is a "feature" of MBF that we should have documented better, so I apologize for not having done so. The idea is that the planner himself locks the costmap ONLY when needed, and so blocks it the less possible time. With SBPL this is important because it's very slow, and locking the costmap for all the planning time, as in old-good move_base, delays a lot any update. Setting planner_lock_costmap to false will make MBF behave as move_base. If that works, I suggest you to modify SBPL to lock the map only while it gets copied (what is very fast), and possibly PR the change (enabled by a parameter set as false by default, as move_base will still lock the costmap).

Hi @corot

Thanks for the quick response, but unfortunately I cannot detect any change in behavior between setting that parameter true or false. By the way - am I misreading your comment on the code when you say you are not locking the costmaps by default? If the parameter is not set it defaults to true, and in the check in line 80 you get the mutex before calling the planner? So by default it locks right?

mmm... could be that clear costmaps recovery cleans the global costmap, and it doesn't get "repainted" fast enough. I added a parameter to the clear costmaps recovery behavior to prevent this (PR). Try to add force_updating equal true to the gloabal_costmap configuration. To verify if that's the reasion, you can also try to remove all calls to clear costmaps, by removing them from the loaded recovery behaviors list.

I had the same thought, but I've already tried a few variations to test that and I don't think that is (directly) the issue. If I'm using clear_costmaprecovery I've always had force_updating: true, and also tried with reset_distance: 0 and no valid layer_names. The reason I have multiple duplicate recoveries was to make sure that MBF didn't abort prematurely after exhausting just 1 recovery behavior, because the weird planner behavior doesn't always show up right away.

recovery_behaviors:
  - name: 'clear_costmap1'
    type: 'clear_costmap_recovery/ClearCostmapRecovery'
  - name: 'clear_costmap2'
    type: 'clear_costmap_recovery/ClearCostmapRecovery'
  - name: 'clear_costmap3'
    type: 'clear_costmap_recovery/ClearCostmapRecovery'
  - name: 'clear_costmap4'
    type: 'clear_costmap_recovery/ClearCostmapRecovery'
  - name: 'clear_costmap5'
    type: 'clear_costmap_recovery/ClearCostmapRecovery'
  - name: 'clear_costmap6'
    type: 'clear_costmap_recovery/ClearCostmapRecovery'

clear_costmap1:
  reset_distance: 0.0
  force_updating: true
  affected_maps: both
  layer_names: [stvl, npvl, obsl]
#
#.
#.
#.
#
clear_costmap6:
  reset_distance: 0.0 #6.0
  force_updating: true
  affected_maps: both
  layer_names: [stvl, npvl, obsl]

To make sure that it wasn't the clear costmap recoveries I tried to similarly have a list of just rotate recoveries, and the same planner behavior happens. So I don't think it's the clear costmap, but I still assume that overall it has to do with the costmaps because otherwise why would the planner disregard ALL layers of the costmap, even the static layer which was never part of the clear_costmap recovery in the first place.

recovery_behaviors:
  - name: 'rotate_recovery1'
    type: 'rotate_recovery/RotateRecovery'
  - name: 'rotate_recovery2'
    type: 'rotate_recovery/RotateRecovery'
  - name: 'rotate_recovery3'
    type: 'rotate_recovery/RotateRecovery'
  - name: 'rotate_recovery4'
    type: 'rotate_recovery/RotateRecovery'
  - name: 'rotate_recovery5'
    type: 'rotate_recovery/RotateRecovery'
  - name: 'rotate_recovery6'
    type: 'rotate_recovery/RotateRecovery'

Hi again @corot

I just tried something that surprised me a bit. I switched to GlobalPlanner instead of SBPL, but kept every other parameter the same and I am seeing the same issue. I didn't expect this, but I was previously testing out GlobalPlanner with replanning active and not this oneshot -> fail -> replan setup that I was trying with SBPL.

So this leads me to one of two temporary conclusions:

Either there is something wrong with the way MBF handles this use-case of one-shot planning and then repeated planning failures due to a trapped robot
Or there is something really wrong with my parameter setup that doesn't materialize in a normal move_base setup where I cannot replicate this behavior with neither GlobalPlanner or SBPL.

Just wanted to let you know so you don't go chasing down the SBPL route. Thanks for your help so far and hopefully we can find the issue together :)

I will see if I can't make a minimal replicable example so we can see the full parameter list and whether something is wrong.

Current planner setup that both can show the issue

planners:
  # - name: 'SBPLLatticePlanner'
  #   type: 'SBPLLatticePlanner'
  - name: 'GlobalPlanner'
    type: 'global_planner/GlobalPlanner'

GlobalPlanner:
  allow_unknown: false
  default_tolerance: 0.0
  visualize_potential: true
  use_dijkstra: true
  use_quadratic: false
  use_grid_path: false
  old_navfn_behaviour: false
  lethal_cost: 253
  neutral_cost: 66
  cost_factor: 0.55

# SBPLLatticePlanner:
#   environment_type: XYThetaLattice
#   planner_type: ARAPlanner
#   # planner_type: ADPlanner
#   allocated_time: 10.0 #30.0 #10
#   initial_epsilon: 15.0 #50.0 #3.0
#   forward_search: false 
#   nominalvel_mpersecs: 0.5 #0.8
#   timetoturn45degsinplace_secs: 3.0 #1.31 # = 0.6 rad/s
#   # lethal_obstacle: 20 #250

I think the problem doesn't happen on move_base cause it runs in a loop and waits a bit after every operation (e.g. clearing the costmaps): https://github.com/ros-planning/navigation/blob/melodic-devel/move_base/src/move_base.cpp#L779 But in MBF we call directly planning right after completing each recovery behavior (that's how I came with the need of the force_updating parameter for clear costmap). Things to try:

double check that move_base is not calling clear costmap recovery (e.g. rostopic echoing the goal and looking for the behavior field).
double check that MBF is actually locking the global costmap (e.g. print something if so)
what's the global planner update frequency? If low, please try increasing it

Hey @kajep09, did you manage to fix this? Can I close the issue?

Closing this as out of scope of MBF

magazino / move_base_flex