SmacPlannerHybrid creates kinematically infeasible path

tonynajjar commented 6 months ago

Bug report

Required Info:

Operating System:
- Ubuntu 22
ROS2 Version:
- Humble
Version or commit hash:
- It's a fork that tracks commit d509fbf0ee14c15a7692ac8fa852bf9d5821a3e2
DDS implementation:
- Fast-RTPS

Steps to reproduce issue

I tried it in my custom simulation but I reckon it should also happen on tb3 simulation. You could try to set the global footprint bigger than the local one and try navigating close to a lethal obstacle on the global costmap.

This is the planner/costmap config:

global_costmap:
  global_costmap:
    ros__parameters:
      use_sim_time: False
      global_frame: map
      map_topic: /no_global_map_topic
      always_send_full_costmap: True
      rolling_window: True 
      track_unknown_space: False
      publish_frequency: 5.0
      update_frequency: 10.0
      resolution: 0.05
      width: 50 
      height: 50 
      lethal_cost_threshold: 99
      # footprint_padding: 0.5
      robot_base_frame: base_footprint
      footprint: '[ [0.215, 0.42], [0.215, -0.42], [-1.01, -0.42], [-1.01, 0.42] ]'
      plugins:
        - 'static_layer'
        - 'inflation_layer'
      static_layer:
        plugin: "nav2_costmap_2d::StaticLayer"
        map_subscribe_transient_local: True
        map_topic: /mission/geofence/global_costmap
      inflation_layer:
        plugin: 'nav2_costmap_2d::InflationLayer'
        cost_scaling_factor: 1.5
        inflation_radius: 0.7 
        inscribe: false 

planner_server:
  ros__parameters:
    use_sim_time: False
    expected_planner_frequency: 20.0
    planner_plugins: ["GridBased"]
    GridBased:
      plugin: "nav2_smac_planner/SmacPlannerHybrid"
      tolerance: 0.2 
      downsample_costmap: false 
      downsampling_factor: 1 
      allow_unknown: true 
      max_iterations: 1000000
      max_on_approach_iterations: 1000 
      max_planning_time: 1.5 
      motion_model_for_search: "REEDS_SHEPP" 
      cost_travel_multiplier: 2.0 
      angle_quantization_bins: 64 
      analytic_expansion_ratio: 3.5 
      minimum_turning_radius: 0.4 
      reverse_penalty: 2.0 
      change_penalty: 0.15
      non_straight_penalty: 1.50 
      cost_penalty: 5.5 
      lookup_table_size: 10.0 
      cache_obstacle_heuristic: False 
      debug_visualizations: true
      smoother:
        max_iterations: 1000
        w_smooth: 0.3
        w_data: 0.2
        tolerance: 1.0e-10

Expected behavior

Create kinematically feasible path

Actual behavior

Resulting path makes planned footprints collide with global costmap

red thin rectangle = global and local footprint red thicker rectangle = TEB debug visualization of the infeasible projected footprint red path = SmacPlannerHybrid global plan

I also tried with a circular footprint, same result

Additional information

SteveMacenski commented 6 months ago

Questions:

Can you verify that the costmap being visualized is the global costmap and those footprints are from the global planner & that in the costmap color map that this really is marked as lethal?
What happens if you start further from the wall but have the goal near the wall?
Have you made any modifications to Smac locally or perhaps missed some syncs which has introduced a bug by branch management? Follow up would be have you tested Nav2 Humble & Rolling and see this in both?
Is there anything going on with a regular update to the footprint topic (for there to potentially be some race condition issues)?

I've not seen this ever in my testing and deployment of Smac so I don't have an immediate thesis of what's going on beyond asking probing follow up questions hoping one will yield some insightful result.

inscribe: false

What is this? Are you perhaps using a custom inflation layer?

Not that I think this is the issue, but what if you use the params from the Global Costmap from nav2_bringup + Smac configuration guide params? Perhaps there's some subtle problem that we see with your setup that could help us narrow into it

tonynajjar commented 6 months ago

Can you verify that the costmap being visualized is the global costmap and those footprints are from the global planner

This is for sure the global costmap and footprint.

& that in the costmap color map that this really is marked as lethal?

In another test, with circular footprint, I visualized the costmap with the "costmap color scheme" and it shows like this:

Do you have some way of getting the cost of a specific square of the costmap, by e,g, clicking on it to be even more sure?

What happens if you start further from the wall but have the goal near the wall?

In all cases SMAC ignores that the projected footprint collides. It just throws an error if the goal is exactly at the lethal obstacle

Have you made any modifications to Smac locally or perhaps missed some syncs which has introduced a bug by branch management? Follow up would be have you tested Nav2 Humble & Rolling and see this in both?

I do have a particular setup yes, it's a fork in between humble and rolling but with no changes to SMAC. Anyway I get it that we should remove this variable by testing on standard Nav2 Humble and/or Rolling. I hit some blockers when trying to do so:

I tried to setup rolling with the devcontainer but that's broken
I tried testing on Humble but for some reason the tb3 demo wouldn't launch properly.
I was thinking it would be hard to reproduce with TB3's circular footprint because the controller might block it from navigating first but I guess it should be feasible if I make the global footprint bigger.

What is this? Are you perhaps using a custom inflation layer?

oops, yeah artifact of the fork but I also reverted this to the default and have the same issue.

but what if you use the params from the Global Costmap from nav2_bringup + Smac configuration guide params?

Bottom line, I think this has to be done on Rolling to narrow down the issue. Let me know if it's a quick thing you can do if you have a working rolling setup already, otherwise I'll give it another shot soon

tonynajjar commented 6 months ago

One question, assuming this is a real bug, is there some kind of test that would detect this?

SteveMacenski commented 6 months ago

I visualized the costmap with the "costmap color scheme" and it shows like this

Is the inflation radius you set smaller than the robot's actual minimum radius? I see in the color scheme now that there is no inflation outside of the critically inscribed space. I'm not even sure in the greyscale visualization why that showed a gradient - it looks to be all 253's. Try increasing your inflation radius / decreasing your cost factor to create a potential field. If a 2+m radius (idk exact, just "enough" to test) does the job, I have a thesis on why but I would have expected that you'd see some logging at the warning/error level at least once about a problem from Smac.

Irregardless, you should have a smoother potential field, it makes Smac behavior better and actually runs faster.

I don't understand your image though, considering the footprint is in the middle completely disconnected from the path + why the path is orbiting so many times.

Bottom line, I think this has to be done on Rolling to narrow down the issue

Agreed, or at least to the extent that you can test it to check if its an issue there or not to know if its something in the stack itself or something going on on your managed fork. This should be a Planning Server specific issue, so you should be able to use the simple commander API + a launch/params file to bundle up a reproduction example for me to bring up only the planner server and execute the request the triggers this for working the problem.

One question, assuming this is a real bug, is there some kind of test that would detect this?

Almost certainly, but I think we need some better understanding of what's actually happening first.

tonynajjar commented 6 months ago

So, I spend a lot of time on trying to make it work on Rolling but I'm still not there yet. So I tried on Humble and I think I could reproduce, unfortunately without the planned_footprints as these are not on Humble (and I couldn't backport easily). We can see that a lethal part of the global costmap is in the footprint.

The screen recording is like 8GB so here is a screenshot instead. I made the global footprint a bit bigger than the local one. These are the changes from the default nav2 params:

SteveMacenski commented 6 months ago

Can you try what I asked? You still have the inflation way too small - there is no potential field established - and I want to know if that is causing the issue

tonynajjar commented 6 months ago

Oops I forgot that part. I'lll try it tomorrow, but assuming it works, we can't just call it a day right? I would think the planner should not generate kinematically infeasible path under any circumstances (throw an error instead). Or am I wrong?

SteveMacenski commented 6 months ago

If that's the case, I know the ~3 places I need to look and think about and I can come back with a better answer about "why" and "what" and what a solution would look like, if one is needed. If nothing else, some kind of warning or error or rejection of optimizations should occur in that situation but maybe I missed something. However we don't even know if that'll help, so one step at a time :smile: But no, that's not the end of it, it just gives me somewhere to start! Right now, if that isn't the cause, I don't have many other answers :laugh:

tonynajjar commented 6 months ago

I see in the color scheme now that there is no inflation outside of the critically inscribed space. I'm not even sure in the greyscale visualization why that showed a gradient - it looks to be all 253's

sorry for the confusion, this was actually a different configuration

Anyway I think I have the results of what you asked for. robot_radius is 0.3 and inflation_radius is 0.55 but it can still happen. With the default values I couldn't reproduce easily but I don't think it's impossible.

P.S: The localization of the TB3 demo is very jumpy, it's hard to tell for sure if the plan is unfeasible or if it just "jumped" into the lethal obstacle. I'm trying to get more convincing examples

I don't understand your image though, considering the footprint is in the middle completely disconnected from the path + why the path is orbiting so many times.

To answer this, the circles are not a path orbiting, the red circular line is the global footprint I set and the blue circles are the debug /planned_footprints. (the red rectangle is the local footprint, not important here I guess) In this use case the robot radius is much bigger than the inflation layer, which you disadvised, but I was just testing my assumption that "SmacPlannerHybrid should never plan a kinematically unfeasible path".

SteveMacenski commented 6 months ago

What's the black? Something seems a little odd. Note that when you use robot radius, it uses center point-checks, not the SE2 footprint. Please use a footprint so that we can have something analog to what you're seeing :-) They're checking different things if circular or non-circular (i.e. center point costs vs full footprint line segment iteration)

P.S: The localization of the TB3 demo is very jumpy, it's hard to tell for sure if the plan is unfeasible or if it just "jumped" into the lethal obstacle. I'm trying to get more convincing examples

Note that we clear the robot's current cell, since it has to be clear if the robot is cough there. So what you showed isn't odd assuming the very next point in the path isn't in collision. That's to prevent deadlocks so that if you're in collision or just up close to something you can never get out of it. What I'd want to see is collisions mid-path!

tonynajjar commented 6 months ago

What's the black?

Not sure :confused: but it's gone with my next configuration

Please use a footprint

Alright let's take this example I cooked up

It's a bit harder to cause a collision mid-path so I created this example which is a (potential) collision end-of-path - I think it points the the same thing right?. I positioned the robot in front of an obstacle and sent it a goal forward towards the obstacle. Result, the path is created but the controller, in this case TEB complains that the path is not feasible

tonynajjar commented 6 months ago

Note that we clear the robot's current cell, since it has to be clear if the robot is cough there.

hmm I'm assuming you're talking about footprint_clearing_enabled parameter? I see that's enabled for the obstacle and voxel layer but it doesn't seem to exists for the static layer. I ran a simple experiment to confirm this:

the cylindrical pillar is part of the static_layer whereas the rectangular one on the left is not (only part of the obstacle layer). You can see that only the obstacle layer is cleared by the footprint.

Maybe it would make sense to also have it for the static layer? Should I create an issue for it?

SteveMacenski commented 6 months ago

Lets simplify this, show this happening with your previous environment setup you originally posted with the new configuration - don't try to make it work on the TB3 example / some other situation. Lets focus on the situation you presented. You had showed it happening for many intermediate poses that was unquestionably problematic, show me that again with the new configs so we know if it helped or not. I don't want us to get bogged down in some first/last point details right now - we should be able to see this clear as day as you showed it before with setting A and gone with setting B if we found the problem.

I think some of what you're using as a proxy (the robot's current localized location) is not a good proxy for planning, so lets remove that estimated proxy and look at the exact things as much as possible

I just grabbed the Smac Hybrid params from the docs and ran it against the same 0.3m robot radius + 0.8 inflation radius on the default Nav2 configs. All seems good to me:

(not in the left photo that it looks like its colliding; its not, the marker array line thinkness is just set wide. Zooming in its fine.)

When I set the inflation radius to be less than the actual robot radius, it clearly occurs

Screenshot from 2024-04-23 15-26-17

Please show me that you see that too :wink:

SteveMacenski commented 6 months ago

So thinking about this logically: if you have a 20m wide robot (for example) but set the inflation radius to 0.1m, then yeah, that seems like a problem because the costmap doesn't even know to look for a collision because its all just 0 cost free space, including where you're literally in collision.

So, option:

Log some complaint that you set the inflation radius less than your actual robot's radius in the inflation layer. Lame, but at least exposes it
Force the inflation layer's radius to be at least the inscribed radius of the robot and also log the warning to users to tell them that they did it wrong and its not safe. I like this option, along probably others in addition to.
Insert some checks in the smac planner for this situation and log errors that are missing now for it (maybe good in addition to the bullet above)

But at its root, your configuration had set the inflation radius less than your robot's actual radius - so that's a fix for the problem, but not a fix to make sure its not possible. Point2 makes it no longer possible so that's my recommendation and should be only a few line change to check the 2 radii and take the maximum of the two. I can show that when they're equal, things work again fine (though gets very close to collision, as you'd expect without a potential field incentivizing against it; but still valid)

tonynajjar commented 6 months ago

So thinking about this logically: if you have a 20m wide robot (for example) but set the inflation radius to 0.1m, then yeah, that seems like a problem because the costmap doesn't even know to look for a collision because its all just 0 cost free space, including where you're literally in collision.

Ah I see, I thought collision checking works differently: I thought for every potential path, it checks if any of the planned_footprints is in collision (like polygon to polygon intersection), and rejects the path if so. Do I understand correctly that the way it actually works is by checking only the cost at the projected poses and reasons based on that? (Anywhere I could read more about it apart from code?)

If that's the case. I like your fix suggestions. We will come back to this first issue but I think, even with the inflation radius set correctly, SMAC can still plan for infeasible paths, when the starting pose of the robot is already close to an obstacle (which is what the original issue is about). Here are some examples:

Here is the global planner/costmap config for these screenshots:

global_costmap:
  global_costmap:
    ros__parameters:
      use_sim_time: False
      global_frame: map
      map_topic: /no_global_map_topic
      always_send_full_costmap: True
      rolling_window: True
      track_unknown_space: True
      publish_frequency: 5.0
      update_frequency: 10.0
      resolution: 0.05
      width: 50 
      height: 50
      lethal_cost_threshold: 99
      robot_base_frame: base_footprint
      footprint: '[ [0.215, 0.42], [0.215, -0.42], [-1.01, -0.42], [-1.01, 0.42] ]'
      plugins:
        - 'static_layer'
        - 'stvl_mixed_layer'
        - 'inflation_layer'
      static_layer:
        plugin: "nav2_costmap_2d::StaticLayer"
        map_subscribe_transient_local: True
        map_topic: /mission/geofence/global_costmap
      stvl_mixed_layer:
        PERFORM_INCLUDE: layers/stvl_cameras.yaml
      inflation_layer:
        plugin: 'nav2_costmap_2d::InflationLayer'
        cost_scaling_factor: 3.0
        inflation_radius: 1.0

planner_server:
  ros__parameters:
    use_sim_time: False
    expected_planner_frequency: 20.0
    planner_plugins: ["GridBased"]
    GridBased:
      plugin: "nav2_smac_planner/SmacPlannerHybrid"
      tolerance: 0.2
      downsample_costmap: false
      downsampling_factor: 1
      allow_unknown: true 
      max_iterations: 1000000 
      max_on_approach_iterations: 1000 
      max_planning_time: 1.5 
      motion_model_for_search: "REEDS_SHEPP"
      cost_travel_multiplier: 2.0
      angle_quantization_bins: 64 
      analytic_expansion_ratio: 3.5
      minimum_turning_radius: 0.4 
      reverse_penalty: 2.0 
      change_penalty: 0.15 
      non_straight_penalty: 1.50 
      cost_penalty: 5.5 
      lookup_table_size: 10.0 
      cache_obstacle_heuristic: False
      debug_visualizations: true
      smoother:
        max_iterations: 1000
        w_smooth: 0.3
        w_data: 0.2
        tolerance: 1.0e-10

SteveMacenski commented 6 months ago

First, these are really pretty pictures! The footprint projections are really cool. But anyway -

What is the green? Curious if you know this wasn't an issue before some update or this is always been an issue. If it was a regression, fast forwarding changes until it happens could be a good way to see where it was introduced if new

This is quite hard for me to debug from afar. What I'm wondering is if there's an issue with collision checking in the analytic expansions. These paths are so trivial that its likely that these are totally formed up of only the analytic expansions without any search. Can you do this again with some more complex request so that search is strictly required at some point and its not trivial to a point that it could be just analytic expansions? That would help answer the following: Do you consistently see this at the start or end of the paths or both?

If it only happens at the end (i.e. when analytic expansions are used), that narrows things down. If it happens at the start (uniquely or combined) that also tells us something.

I cannot replicate this on my side so far, so I feel like this is still a configuration derived issue, but I don't discount a real bug either. However there are a bunch of companies with large non-circular robots using this and it hasn't been reported by them unless its a newly introduced bug.

  rolling_window: True

This is still suspicious to me, can we not do that for testing purposes or can you show that it happens with a normal global map without this? Again, collecting data. If we find that's the case, we can dig into why

    inflation_radius: 1.0 ... footprint: '[ [0.215, 0.42], [0.215, -0.42], [-1.01, -0.42], [-1.01, 0.42] ]'

I'm reading this that your robot is 1.01 from the pivot point but your inflation radius is only 1.0. What if you make the inflation radius 2.0 (just sufficiently larger than 1.0)?

       cost_travel_multiplier: 2.0

This isn't a param for Hybrid-A*, this is making me suspicious that there is something really odd going on with your params. What if you use the params from https://navigation.ros.org/configuration/packages/smac/configuring-smac-hybrid.html#example exactly?

Note: When I say "start" and "end" I don't want to even look at the very first or very last points, just generally collisions are being recorded near the start/end. Lets not miss the forest for the trees.

tonynajjar commented 6 months ago

What is the green?

Green is TEB's path/control sequence

Curious if you know this wasn't an issue before some update or this is always been an issue.

I can't pinpoint a moment where that "started" happening, I think it was always there

Can you do this again with some more complex request so that search is strictly required at some point and its not trivial to a point that it could be just analytic expansions? That would help answer the following: Do you consistently see this at the start or end of the paths or both?

Since the bug mostly occurs when being already close to an obstacle (being in circumscribed area), I would say that it occurs mostly at the start or end of a path i.e. first/last 5 poses. It does not happen when going from free point A to free point B while avoiding an obstacle midway.

I cannot replicate this on my side so far

That's odd. Do you have all the information needed to try and reproduce it? Important factors:

Send many goals as close as the lethal region as possible
it doesn't always reproduce from the first time - maybe try reproducing exactly the start and end goal that you see in the screenshots
rectangular footprint
use the same configuration as mine (maybe replace STVL with obstacle layer) - I also tried with the default configuration and will try again
Are you testing on rolling or humble? I'm using Nav2 rolling but on Humble ROS system with minor changes just so that it builds (e.g QOS)

However there are a bunch of companies with large non-circular robots using this and it hasn't been reported by them It could be that their controllers workaround the infeasible plan

This is still suspicious to me, can we not do that for testing purposes or can you show that it happens with a normal global map without this?

Will do. I'll test again with as much "standard" config as possible.

What if you make the inflation radius 2.0 (just sufficiently larger than 1.0)?

will do

This isn't a param for Hybrid-A*, this is making me suspicious

that was an oversight, the default config was just copied as is and changed what is needed. Removed

Since you can't reproduce easily, do you have ideas what debug logs I could add inside the code that could give some insight?

tonynajjar commented 6 months ago

Let's start with more pictures.

First some successes:

the failures:

Default config used:

global_costmap:
  global_costmap:
    ros__parameters:
      use_sim_time: True
      global_frame: map
      map_topic: /no_global_map_topic
      always_send_full_costmap: True
      rolling_window: False
      track_unknown_space: True
      publish_frequency: 1.0
      update_frequency: 1.0
      resolution: 0.05
      width: 50 
      height: 50
      robot_base_frame: base_footprint
      footprint: '[ [0.215, 0.42], [0.215, -0.42], [-1.01, -0.42], [-1.01, 0.42] ]'
      plugins:
        - 'static_layer'
        - 'stvl_mixed_layer'
        - 'inflation_layer'
      static_layer:
        plugin: "nav2_costmap_2d::StaticLayer"
        map_subscribe_transient_local: True
        map_topic: /mission/geofence/global_costmap
      stvl_mixed_layer:
        PERFORM_INCLUDE: layers/stvl_cameras.yaml
      inflation_layer:
        plugin: 'nav2_costmap_2d::InflationLayer'
        inflation_radius: 2.0
        cost_scaling_factor: 0.7

planner_server:
  ros__parameters:
    use_sim_time: True
    planner_plugins: ["GridBased"]
    GridBased:
      plugin: "nav2_smac_planner/SmacPlannerHybrid"
      downsample_costmap: false           # whether or not to downsample the map
      downsampling_factor: 1              # multiplier for the resolution of the costmap layer (e.g. 2 on a 5cm costmap would be 10cm)
      tolerance: 0.25                     # dist-to-goal heuristic cost (distance) for valid tolerance endpoints if exact goal cannot be found.
      allow_unknown: true                 # allow traveling in unknown space
      max_iterations: 1000000             # maximum total iterations to search for before failing (in case unreachable), set to -1 to disable
      max_on_approach_iterations: 1000    # Maximum number of iterations after within tolerances to continue to try to find exact solution
      max_planning_time: 5.0              # max time in s for planner to plan, smooth
      motion_model_for_search: "DUBIN"    # Hybrid-A* Dubin, Redds-Shepp
      angle_quantization_bins: 72         # Number of angle bins for search
      analytic_expansion_ratio: 3.5       # The ratio to attempt analytic expansions during search for final approach.
      analytic_expansion_max_length: 3.0  # For Hybrid/Lattice nodes: The maximum length of the analytic expansion to be considered valid to prevent unsafe shortcutting
      analytic_expansion_max_cost: 200.0  # The maximum single cost for any part of an analytic expansion to contain and be valid, except when necessary on approach to goal
      analytic_expansion_max_cost_override: false  #  Whether or not to override the maximum cost setting if within critical distance to goal (ie probably required)
      minimum_turning_radius: 0.40        # minimum turning radius in m of path / vehicle
      reverse_penalty: 2.0                # Penalty to apply if motion is reversing, must be => 1
      change_penalty: 0.0                 # Penalty to apply if motion is changing directions (L to R), must be >= 0
      non_straight_penalty: 1.2           # Penalty to apply if motion is non-straight, must be => 1
      cost_penalty: 2.0                   # Penalty to apply to higher cost areas when adding into the obstacle map dynamic programming distance expansion heuristic. This drives the robot more towards the center of passages. A value between 1.3 - 3.5 is reasonable.
      retrospective_penalty: 0.015
      lookup_table_size: 20.0             # Size of the dubin/reeds-sheep distance window to cache, in meters.
      cache_obstacle_heuristic: false     # Cache the obstacle map dynamic programming distance expansion heuristic between subsiquent replannings of the same goal location. Dramatically speeds up replanning performance (40x) if costmap is largely static.
      debug_visualizations: false         # For Hybrid nodes: Whether to publish expansions on the /expansions topic as an array of poses (the orientation has no meaning) and the path's footprints on the /planned_footprints topic. WARNING: heavy to compute and to display, for debug only as it degrades the performance.
      use_quadratic_cost_penalty: False
      downsample_obstacle_heuristic: True
      allow_primitive_interpolation: False
      smooth_path: True                   # If true, does a simple and quick smoothing post-processing to the path
      debug_visualizations: true
      smoother:
        max_iterations: 1000
        w_smooth: 0.3
        w_data: 0.2
        tolerance: 1.0e-10
        do_refinement: true
        refinement_num: 2

as a workaround, is there a config that specifies how far away the robot_base_frame or the footprint should stay away from lethal regions?

SteveMacenski commented 6 months ago

Hi,

Got it. Huh. Honestly not sure what to say about that off hand. I still can't seem to make that happen. I believe you that there is a problem if that's what you're seeing, but I'm at a loss if I can't reproduce it. Can you make a repository with the costmap, planner configs, launch file, world, and a script to call the server to go from some example A->B that you're using here so that I can try 1:1 what you're doing on my side and use that as a test case to look into?

If I can poke around, that would be ideal. Otherwise, I'd ask you to look into the collision checker object and log information that you think might help point to a particular issue.

I know, for instance, the last point could potentially be slightly in collision due to the exact goal pose added at the end of the path to have precise solution even though the plan is made in a binned space. But any intermediary point should never be possible.

A few of the failures (images 1,2,5) I look at and see that it appears to be the start/end point which is something we should deal with as a separate matter. Image 4 also has that but I see the 2nd to last pose could (?) be in collision. Its hard to tell given the thickness of the line. Image 3 looks less ambiguous but still not overly clear unlike some of your original images

tonynajjar commented 6 months ago

Yeah I get that it can be hard to debug if you can't reproduce. Instead of creating a repo from scratch, how about I use what I can from TB3 demo, change to a rectangular footprint + config used above + script and send you my branch?

SteveMacenski commented 6 months ago

See the note above I edited, you responded faster than I thought :wink:

I would like something standalone with the planner server / demo script so we can look at this in isolation so we're not messing with the full simulator with localization updates and whatnot that can make the waters a little more murky. Usually what you describe would be fine for me (even preferable so I don't have to change my usual workflows), but this looks to be subtle now unlike before, which was a clear and obvious problem, so I don't want to leave anything to interpretation

tonynajjar commented 6 months ago

I've been delaying creating the reproducible example because it seems like a lot of work. Instead I tried to debug it more myself and I think I have a good lead: by commenting out this check and consequently making the collision checker run all the time, I could not reproduce the issue anymore. So I'm guessing footprintcost or possible_collisioncost is wrong.

I'm finding it hard "proving" that one or the other is wrong but that's where I'm at right now, if you have any leads, shoot

tonynajjar commented 6 months ago

One more insight, by adding this log:

I sometimes get a footprint_cost_ of 0 when it's clearly not 0 for any point on the path:

Any idea how that could happen?

SteveMacenski commented 6 months ago

by commenting out this check and consequently making the collision checker run all the time, I could not reproduce the issue anymore. So I'm guessing footprintcost or possible_collisioncost is wrong.

Not shocked by that I suppose, but unclear. For the second image, there are clearly points in free space, so those 0.0s might be real. What processor are you running this on?

I don't have an immediate answer. Some costmap resource not being locked correctly in your fork (or main)? I forget if you're showing this based on your managed fork or Nav2 mainline

  update_frequency: 10.0

What if this is set to 1.0? Curious.

tonynajjar commented 6 months ago

Not shocked by that I suppose

It's already useful to know that the problem is likely not in the collision checking itself but in the fact that collision checking is not triggered

What if this is set to 1.0?

Same thing

What processor are you running this on?

AMD Ryzen 7 7840HS w/ Radeon 780M Graphics

there are clearly points in free space, so those 0.0s might be real.

with huge inflation layer still happens:

My next step would be to query manually again for the cost at these coordinates and see if it's like you said an issue with locking the costmap. Do you know of an easier way than to create a service that calls costmap_->getCost?

SteveMacenski commented 6 months ago

Not off hand, but the rviz color scheme seems clear that it is not actually 0. Hence suspecting threading / locking issues getting corrupted data perhaps

tonynajjar commented 6 months ago

Hence suspecting threading / locking issues getting corrupted data perhaps

Looks like it's the case, I plugged in the coordinates in a custom service that get the cost and it doesn't return 0. I'm really going down the rabbit hole with this bug... Sounds like it's not your first rodeo with /threading/locking issues, anything you could link at from the past that could help me?

SteveMacenski commented 6 months ago

Nothing off hand - just general programming experience that issues like data not getting results like they appear to be can be caused by simultaneous read/writes. I'd look at locking the costmap and seeing if that fixes problems, maybe there's an edge case that isn't being covered. Though, if you set the update rate for the global costmap stupid low (i.e. 0.000001 hz) and you still see it, then maybe it isn't that (since there aren't write operations occurring)

tonynajjar commented 6 months ago

if you set the update rate for the global costmap stupid low

I set it to 0.1 Hz and could still reproduce. Setting it to lower than that blocks the activation of the planner server for some reason

I also suspected footprint_clearing_enabled had something to do with it but deactivating it made no difference

marcusvinicius178 commented 6 months ago

Hi @tonynajjar did you find a solution or new depuring ideia? I am also trying to trick this collision behavior to avoid it, but any combination of parameters config worked https://robotics.stackexchange.com/questions/111025/how-to-configure-nav2s-smac-planner-to-maintain-a-safer-distance-from-obstacles/111044#111044

I have recognized that the inflation layer works better for circular footprint (the blue area inflates correctly, while the rectangle kind of scape from boundaries and seems that don't work). You have mentioned that the collision checker is not triggered right? Is this the main issue cause? I did not dived deep into logging this as you did, but I have ROS2 Galactic and ROS2 Iron installed. Let me know if I can help with some test.

For me it's strange that all nav2 planners I tested had the same issue while using the Ackermann motion model. Other observation is that months ago when I used a loaded static map, the collision avoidance seemed to work, and now using just rolling windows this avoidance is not working. I could not track what is generating this issue yet,that is frustrating since I need this working as soon as possible.

Do you think using some packages from Autoware in my nav2 forked repo to obstacle mapping and path Planning could work?

doisyg commented 5 months ago

I did not properly follow the discussion, but if that helps, here is what I use to debug and test the planners: https://github.com/botsandus/planner_playground

tonynajjar commented 5 months ago

Thanks a lot @doisyg, this saved me a lot of time.

@SteveMacenski can you try to reproduce:

1- git clone https://github.com/angsa-robotics/planner_playground -b custom-testing-angsa and build 2- ros2 launch planner_playground test_planner.launch.py 3- ros2 run planner_playground send_goal_manually

I get

My setup was using https://github.com/angsa-robotics/navigation2 main branch, which is somewhere in between iron and rolling (hoping to change that when Jazzy is out and stable). Can't test on humble because the planned footprints are not there and can't test on rolling because a ton of stuff is broken (even with https://github.com/ros-navigation/navigation2/pull/4298, the servers don't launch properly for some reason)

SteveMacenski commented 5 months ago

That doesn't appear to be the same collisions we were talking about - again - please - I want to focus on the main issue of collisions mid-path for a clear and unambiguous situation. The first / last poses have reasons separate of any potential major bug for they they have contact (including clear on start & exact solution requirements) that I don't want to get mixed up with for the main root of a more substantial problem. Since you're also testing on a private modified branch that adds a level of "I need a clear example" that I can work from on Rolling / main.

Please provide an example for that situation that you've provided many examples previously :-) I'm glad that Guillaume mentioned his package, that should make it easy to add in your map / start/goal poses!

tonynajjar commented 5 months ago

Please provide an example for that situation that you've provided many examples previously :-)

hmm, can you send again which examples you're talking about? All the screenshots I provided show a collision at the beginning or end of the path, which is the bug I'm trying to solve. I'm convinced that with a correctly-configured inflation layer, there will not be a collision mid-path (we established that a fix should be created to warn or block a badly configured inflation layer)

SteveMacenski commented 5 months ago

A couple where the non-first and non-last points are in collision

https://github.com/ros-navigation/navigation2/issues/4271#issuecomment-2096307857 - last image
https://github.com/ros-navigation/navigation2/issues/4271#issuecomment-2070326335 (though I think this was fixed once you made your inflation radius not smaller than your robot's minimum radius?)

Unless the first one isn't actually in collision and its just really close and the line thickness makes it look that way?

Or are you saying that after fixing the inflation radius issue you cannot reproduce the issue anymore and its only in the first or last path points - so that's the only remaining issue?

tonynajjar commented 5 months ago

(though I think this was fixed once you made your inflation radius not smaller than your robot's minimum radius?)

exactly

Unless the first one isn't actually in collision and its just really close and the line thickness makes it look that way?

I mean there is definitely a collision there, zooming in:

But yeah, I can't reproduce a similar case.

Or are you saying that after fixing the inflation radius issue you cannot reproduce the issue anymore and its only in the first or last path points - so that's the only remaining issue?

Exactly!

SteveMacenski commented 5 months ago

Oh, ok then. My apologies then, I thought we were still on the original issue. If its just first-last then what you showed me is perfect. This and the issue with the cost scaling factor are on my calendar for Monday to start looking into

SteveMacenski commented 5 months ago

Splendid. Not that I need to convince anyone "opensource = good" but another case where @doisyg's one-off testing thing ends up easy to modify and play with for other applications to accelerate them!

SteveMacenski commented 5 months ago

@tonynajjar chronicling a bit:

Part 1

First thing I did was throw out your Smac Planner parameters for the defaults in docs.nav2.org's configuration guide (minus adding REEDS_SHEPP since that was required to get it to work) --> still see the issue. This is a good sign to me.
Increased your inflation_radius from the rather exact value to 3.0 to generally just fill the space --> still happens. No surprise there.
I adjusted your footprint to be the same size but have the center of the polygon be the center of rotation --> still happens. I was kind of hoping that was the issue because I'd know exactly where to analyze in that case
I set the tolerance to 0.0 --> plan fails to generate This is what I expected actually w.r.t. the final pose in collision which you example provided is displaying. What you requested is an impossible path.

So, lets dig into that. I'm pretty sure we just tack on the goal at the end of paths somewhere which is what you're seeing there. That point isn't even being collision checked so its truly unique to the final & exactly requested goal pose. This is why I was trying to focus on the main-planning-in-collision issue, I think I'll make quick work of this one and wanted to focus on the more pressing issue first. Expect a fix by EOD :-)

Part 2: Coming to a GitHub thread near you

Was there an issue with the start point as well? This example doesn't show one. I don't think that should be an issue - or its with the clearing start footprint enabled kind of thing.

tonynajjar commented 5 months ago

Not on my computer currently but yes also happens at the start (or start + x) point

SteveMacenski commented 5 months ago

Please provide a reproduction like you did with the goal one and I can look into that.

SteveMacenski commented 5 months ago

@tonynajjar OK, I have a patch for this problem. I don't like how I had to do it, so I'm going to spend some time to make a more elegant solution, but you can count on the final-goal-pose-tacked-on-in-collision problem resolved & a PR will appear in the next few hours

tonynajjar commented 5 months ago

Splendid. Not that I need to convince anyone "opensource = good" but another case where @doisyg's one-off testing thing ends up easy to modify and play with for other applications to accelerate them!

I would suggest bringing this somewhere into Nav2 or the org for long-term maintenance. I wonder what other testbenches would be useful

SteveMacenski commented 5 months ago

@doisyg got any new engineers around? Might be a good intro-project to take that and put it into our /tools directory as a package for reproducing planning problems. Certainly you or I could do it, but seems like a nice good-first-issue style task if anyone came to mind

SteveMacenski commented 5 months ago

Part 2:

The key is this line: https://github.com/ros-navigation/navigation2/blob/main/nav2_smac_planner/src/analytic_expansion.cpp#L213 We start at i=1 since i=0 is the current node, and we go until num_intervals - 1, whereas num_intervals is the end point. So we don't actually evaluate the last point's validity to add to the analytic expansion. Then when we set the analytic path that is validated, we do so adding in:

  if (goal_node != prev) {
    goal_node->parent = prev;
    cleanNode(goal_node);
    goal_node->visited();
  }

which tacks on the final goal pose for exact search in case of some minor deviations due to floating point error somewhere in the expansion. That's how this usually works to give you exact paths and the final goal pose might actually be invalid. I don't think (?) we want to change this mechanic so that you can request a really close goal pose point.

I had a much, much less intelligent solution before while trying to keep analytic expansion working for non-exact valid goals, but I just realized that the analytic expansion really only should work when you have an exact match. Not checking the final pose and bypassing it seems silly. There is some convenience in having the last point be differently checked (i.e. we check analytic expansion if valid without the last point so you can have something really close to obstacles -- then separately check the last point to see if we should add it or not), but I don't think that's really the intention behind the analytic expansions in the first place.

So... I just changed it to a <= statement so we check the last point. Now planning will take a little bit longer in the case that the goal pose is just 1-point into collision, because now the analytic expansion just straight up gets rejected instead of providing a valid expansion except for the final point (that we I could just clip off). It will make those requests slower, buts its probably the right solution.

The alternative is that I can keep things as < and then I can check the final goal pose separately about whether to add it in (as part of the code block above) so that its can be faster and more close to the intended goal pose with tolerances in this situation. I think that is technically valid and safe, but less intuitive. So, I'm preferring the more strictly-correct method, but I can be easily swayed back if there is an objection. That would persevere the current behavior, only now we make the call about whether or not we actually set the goal_node to parent based on that collision check. Otherwise, just return prev. That's also a simple change as well, I just need to change how that's handled downstream in a few places.

Please review: https://github.com/ros-navigation/navigation2/pull/4353

doisyg commented 5 months ago

@doisyg got any new engineers around? Might be a good intro-project to take that and put it into our /tools directory as a package for reproducing planning problems. Certainly you or I could do it, but seems like a nice good-first-issue style task if anyone came to mind

I don't but if you want me to contribute back, ping me on an issue and I will happily PR

marcusvinicius178 commented 5 months ago

Hi @SteveMacenski I successfully achieved collision avoidance with the SMAC planner thanks to your recent commit and some adjustments written here: Params Adjustments

Avoidance: Ackerman Truck Failed to Avoid Obstacle

However, I seek for earlier collision avoidance to ensure a safer distance from obstacles. Could this be optimized through parameter settings (global or local) or would it require modifying the SMAC planner source code? Specifically, I need the transition to steering and collision checking to initiate sooner. Thank you for the recent commit, and I look forward to any guidance or suggestions on this matter!

tonynajjar commented 5 months ago

I tested the fix. First of all, no more colliding footprints with is great:

It looks like two projected footprints were removed and not just the one in collision. I don't know if it's a big deal but might hint to an issue in the fix (?)

before:

after:

EDIT: Looking at it again, I doubt it's an issue because it looks like a completely different path was selected, as opposed to clipping two final poses from the original path. Will leave my comment there anyway for you to be the judge.

Please provide a reproduction like you did with the goal one and I can look into that.

This is next up for me

tonynajjar commented 5 months ago

This is next up for me

It didn't take long, @SteveMacenski try it with: git clone https://github.com/angsa-robotics/planner_playground -b reproduce-collision-at-the-start

SteveMacenski commented 5 months ago

EDIT: Looking at it again, I doubt it's an issue because it looks like a completely different path was selected, as opposed to clipping two final poses from the original path. Will leave my comment there anyway for you to be the judge.

Correct. Because this is no longer finding the exact analytic solution, its finding another solution within your specified tolerance parameter. Its not just removing the last point (which was the second option I set in my write up above).

ros-navigation / navigation2