ros-navigation / navigation2

ROS 2 Navigation Framework and System
https://nav2.org/
Other
2.58k stars 1.3k forks source link

Zig-zag due to unfavorable angle increments and starting angle #3172

Closed mangoschorle closed 1 year ago

mangoschorle commented 2 years ago

I am experiencing some interesting behavior when using SmacPlanner Hybrid. I am not quite sure whether this is a common mistake/behavior maybe caused by incorrect usage, or whether it is a known limitation. I currently have no idea how to approach this / solve this. If ideas pop up along this thread, I am more than willing to contribute.

So the issue is that depending on which angle-bin the start node falls into, I am getting either a straight line or a very messy zig-zag.

We have 64 angle bins. Here the start is in bin 32 (pi) Screenshot from 2022-09-05 21-53-44

In this image the starting angle is in bin 31 Screenshot from 2022-09-05 21-54-39

I also found that due to my choice of resolution (10cm) and min_turning_radius (50cm), the turning motions of the minimal primitive set (Dubins) cover four angle bins at a time, i.e., increments=4 https://github.com/ros-planning/navigation2/blob/634d2e3d9b6bde5558c90fe6583d52b3ed66cf55/nav2_smac_planner/src/node_hybrid.cpp#L104)

This is actually quite bad since you one will therefore never end up in odd angle bins. But only in every fourth bin that are multiples of four. This also causes the zig-zagging because just can't reach an angle bin that would directly go in the right direction. Also as can be seen in the image, the planner has to expand WAY more nodes to make progress.

This is a distribution of the angle bins. As you can see, we only reach every fourth angle bin. Screenshot from 2022-08-04 16-42-16

I wonder / am collecting ideas, on how we could find remedy for this zig-zagging issue. I know, we could double the resolution or use a larger turning radius (>2 meters if we want to cover only a single bin) but we are running into other issues if we do so. Maybe there are other ideas. I have tried adding additional turning primitives which turn much less than the min_turning_radius but didn't get good results. Again, if I understand the idea / solution, I'd be more than happy to implement, test, and submit a PR.

SteveMacenski commented 2 years ago

Interesting

Have you tried more angular bins? I did notice generally in my testing that 64 was really not enough for "great" quality paths. 72 I put as the default was really the smallest amount that worked well and is also representative of a meaningful quantization of 5 degrees (many works on similar topics also use that discretization; didn't see a reason to argue with it after trying coarser samplings and didn't get good results). Though, I just noticed that 64 is in the readme, but 72 is the actual default value.

We've also changed quite a bit since the increments = ceil(angle / bin_size); was added for early testing results. This was added because looping was happening in the paths to get a slightly different approach angle to get through a passage way. e.g. it would go in a loop in a circle just to approach on a different vector / bin through some area. It could be that we could remove that now and would help? I do suspect though to some degree that we saw that happening in exactly the zig-zag situation where we over-expanded the search and created that looping behavior.

More thoughts / options:

mangoschorle commented 2 years ago

Some more insights:

Its not about the number of angle bins. You can increase it and still get awful results. Its more about the (I have big trouble writing down what I mean) special regularity of possible angles that emerges in some situations.

In the following image we have a starting angle of pi and a bin = 32. Due to the turning radius and the num_angle_bins we have increments = 4. So: The starting bin is 32 and we can hop 4 increments at a time. This is why we get 16 (64/4) possible directions of motion (indicated by the yellow star). If we just double the num_angle_bins (128), we get exactly the same result. Here is why: turning radius and resolution remain the same. The only thing that now changes is increments. Increments is doubled to 8. So on a 128 bin angle resolution we now hop 8 angle bins at a time. Leaving us again with exactly 16 (128/8) possible directions. In the image the plan is very neat and you can see that the first action chosen is a right turn

Screenshot from 2022-09-06 20-54-33

Plans however get super ugly and require a long time in the following two situations:

  1. The motion set (the yellow star in the image) is not perfectly axis aligned.

If we choose a starting point with a odd bin number (for example a bit 31 instead of 32) than there is no motion primitive that allows moving directly to the right (the yellow star has turned a bit and nothing is pointing to the right), and hence we get these zigzag motions. Note that you can always choose a direction that doesn't fit to one of the sixteen direction primitives and .. you will still get zig zag.

Screenshot from 2022-09-06 21-08-50

  1. The motion set (the yellow star in the image) is not regular and has no primitives along the axis. With an odd choice of angle bins (113) you might get irregularily shaped motion sets, planning times start to spike and the plans aren't really usable.

Screenshot from 2022-09-06 20-58-26

In total I have found that (at least for our use-cases which have many parallel roads and right-angle intersections, it suffices to make sure that the primitive set is regular. We then round the actual starting bin (e.g. 31) to the nearest of the regular ones (to the nearest yellow line of the star), and take it from there. The results are satisfying even though my heart is aching. In the end it is an awful hack.

Cheers for the suggestions.

SteveMacenski commented 2 years ago

So it sounds like just more bins and having an odd number of bins won't help (though, an odd number of bins could help, but only when making the search pattern do a full turn around :laughing: )

We then round the actual starting bin (e.g. 31) to the nearest of the regular ones (to the nearest yellow line of the star), and take it from there.

Yeah, I'm not really liking that either. That only works for your situation because the map is exactly axially aligned. If the map was off at some odd angle, then its possible that being offset is what would generate the better paths. Plus it in no way resolves the issue that all 64 bins cannot be reached and instead uses a far downsampled version.

What about enforcing a constraint that the primitives must go through an odd number of angular quantizations? So that its impossible for 4 to be an option, it has to be 3 or 5. That would increase the primitive lengths slightly (assuming we round up and not down).

The other things I'm thinking about is

We've also changed quite a bit since the increments = ceil(angle / bin_size); was added for early testing results. This was added because looping was happening in the paths to get a slightly different approach angle to get through a passage way. e.g. it would go in a loop in a circle just to approach on a different vector / bin through some area. It could be that we could remove that now and would help? I do suspect though to some degree that we saw that happening in exactly the zig-zag situation where we over-expanded the search and created that looping behavior.

Or maybe having a special case for the first expansion from the start pose uses the normal primitives, but also allows for primitives of length 1 (or 3 or something odd, maybe N-1?) angular quantization bin if that amount puts it into a new spatial bin.


The 2nd figure you show though looks like a straightline path should have been largely possible? That's quite strange - there are a few expansion branches shown that basically just be straight and do a fine enough job at it. What if you remove the planning penalty functions? Maybe non-straight / change are creating this to some level

mangoschorle commented 2 years ago

One way how we could make sure more bins are visited is doing the following: We settle for a certain number of bins so that we have some regularity such as 64.

We now check given the settings for min_turning_radius and resolution, what is the Number of increments (e.g. 4).

We now compute the turning_radii (> min_turning_radius) which would achieve the intermediate bins and add the respective primitives to the set. All primitives have equal length (sqrt(2)).

Here is an example (I used one of your maps, with bin_size=4 and all other parameters default).

This is the old case. 64 bins, increments = 4, therefore we can only reach 16 diffferent orientations. The diagonal motions are zig-zaggy. In blue, you can actually see the distribution of bins in the graph. Wide gaps between the blue lines as we don't reach every orientation bin.

Screenshot from 2022-09-07 09-33-23

This is the new case. 64 bins, increments = 4. I add two additional primitives with increments +2 and -2. (I of course changed the TravelCost computation accordingly) Therefore we can now reach 32 diffferent orientations. The diagonal motions are not zig-zaggy. In blue, you can actually see the distribution of bins covering every second bin. Screenshot from 2022-09-07 09-34-07

This is the new case. 32 bins, increments = 2. I add two additional primitives with increments +1 and -1. Therefore we can now reach all possible 32 diffferent orientations. The diagonal motions are not zig-zaggy. In blue, you can actually see the distribution of bins covering every bin.

Screenshot from 2022-09-07 09-47-36

So to be honest I think we should settle for less but reachable bins (32) at the cost of having to add a bigger primitive set.

SteveMacenski commented 2 years ago

What about my suggestion to simply make the number of angular bins the trajectories represent non-exact? That seems like the lower hanging fruit before we escalate to more primitives. Though idea in the next paragraph below could be a good solution.

On the more primitives front - what if we only change the first expansion to use the +/- 1 and +/- 2 bin approach you describe, both? That would seed some expansion trees in the non-4 and non-even number distributions to expand from and in no way impact the run-time of the rest of the planner while still being entirely feasible. My assumption here is that the non-zig-zag paths are more optimal than the zig-zag paths so if we give those trees the ability to expand on the off-angle distributions, it should overtake the zig-zag ones. This seems like a good solution, actually. No fudging there since we're not just placing the robot at neighboring bins, we're just using less aggressive turning radii to generate still-feasible trajectories. It should have virtually the same impact.


What is the run-time impact of the additional primitives? We'd like to keep something like 64/72 bins (not 32) to give some really good resolution of paths at 5 deg.

In your last example with +/- 1 bin, is that in addition to the +/- 2 bins you did previously (so its the normal set + 4 primitives, 1 bin apart and 2 bins apart) or in place of (so its the normal set + 2 primitives, 1 bin apart)? Just for clarification.

So to be honest I think we should settle for less but reachable bins (32) at the cost of having to add a bigger primitive set.

I'd disagree, if the user says they want to use 64/128/whatever angular bins, we should be using them all. It doesn't mean they're all used uniformly, but they should be possible to achieve them. But I suppose my opinion on that could be swayed in the other direction if we don't think there is a large overlap of settings of turning radius + resolution that yields this problem so its a relatively minor occurance.

mangoschorle commented 2 years ago

What is the run-time impact of the additional primitives? We'd like to keep something like 64/72 bins (not 32) to give some really good resolution of paths at 5 deg.

On my end the additional primitives had the opposite runtime effect of what one would think. Once you use more primitives and all bins start to be reachable, it takes WAY less expansion to reach the goal. The additional cost of more primitives is negligible in comparison.

In your last example with +/- 1 bin, is that in addition to the +/- 2 bins you did previously (so its the normal set + 4 primitives, 1 bin apart and 2 bins apart) or in place of (so its the normal set + 2 primitives, 1 bin apart)? Just for clarification.

In the last example I reduced the angle bins to 32 (instead of 64). The increments variable is then 2. So I added primitives for +-1 and +-2 (The +-2 are the ones which represent the min_truning_radius, i.e. what is currently implemented). In the second last image I had 64 bins (increments = 4 for min_turn_rad). I there had +-4 (for min_trun_rad) and +-2. Now, only every second bin is reachable. The solution is of course exactly identical to the last image.

I have tried the solution I proposed on several of our maps and I am really happy with the results for now. I will give what you proposed a try let you know the results.

mangoschorle commented 2 years ago

Results for:

On the more primitives front - what if we only change the first expansion to use the +/- 1 and +/- 2 bin approach you describe, both? That would seed some expansion trees in the non-4 and non-even number distributions to expand from and in no way impact the run-time of the rest of the planner while still being entirely feasible. My assumption here is that the non-zig-zag paths are more optimal than the zig-zag paths so if we give those trees the ability to expand on the off-angle distributions, it should overtake the zig-zag ones. This seems like a good solution, actually. No fudging there since we're not just placing the robot at neighboring bins, we're just using less aggressive turning radii to generate still-feasible trajectories. It should have virtually the same impact.

As you can see in the close-up below I am expanding all possible primitives (anglebins) only in the very first iteration. Yet, very poor results with 72 bins. Most of them never reached and the graph is large. Screenshot from 2022-09-08 15-45-18 CloseUp (only first) Screenshot from 2022-09-08 15-45-45

Results from dense primitive set in every expansion: Screenshot from 2022-09-08 15-47-19

CloseUp (All)

Screenshot from 2022-09-08 15-47-35

Just out of curiosity I also tried expanding the first 500 astar iterations with all primitives after that with the minimal set (3).

image

CloseUp (First 500) image

SteveMacenski commented 2 years ago

Interesting. So it seems like continuously expanding with more primitives could be helpful for this situation.

From these new results - what do you suggest as the move now?

I'd be interested in some performance testing on whatever you suggest relative to what's there now for a handful of situations in both confined spaces (like what you show) and open-area maps. Either way, even if it comes with a performance hit, I think its worth adding the option to expand in more orientation bins, but it is good to do so with eyes wide open about what the impact on memory/CPU is. For the zig-zag situation, you're right in that it might result in less compute time because we're not over-expanding the tree, but for the situations where zig-zagging didn't occur (when the starting bin was some even increment), I'd like to know how much worse that is compute wise. We're going to be checking a whole lot more things for collision which is not free!

This is probably worth adding a boolean parameterization for to enable / disable so we can get the same behavior as today for folks that have motion primitive lengths / costmap resolutions that don't have this problem. And add some description in the readme / navigation.ros.org about why it is added. You've made some really fantastic figures here as well, no reason to let those be lost to the void :wink:

mangoschorle commented 2 years ago

Before going any further into this I must ask the followng: If it is actually the case (I didn't mess up) and most angle bins aren't visited at all, what is it that you see as a downside of reducing the amount of bins? I mean we could have an analytical expansion to the exact goal coordinate and not to a discretized one. So the planner would reach the goal with floating point precision irrespective of the discretization. We would get rid of the zig-zagging due to the inital bin and no further primitives would have to be added.


Von: Steve Macenski @.> Gesendet: Freitag, 9. September 2022 02:22 An: ros-planning/navigation2 @.> Cc: mangoschorle @.>; Author @.> Betreff: Re: [ros-planning/navigation2] Zig-zag due to unfavorable angle increments and starting angle (Issue #3172)

Interesting. So it seems like continuously expanding with more primitives could be helpful for this situation.

From these new results - what do you suggest as the move now?

I'd be interested in some performance testing on whatever you suggest relative to what's there now for a handful of situations in both confined spaces (like what you show) and open-area maps. Either way, even if it comes with a performance hit, I think its worth adding the option to expand in more orientation bins, but it is good to do so with eyes wide open about what the impact on memory/CPU is.

This is probably worth adding a boolean parameterization for to enable / disable so we can get the same behavior as today for folks that have motion primitive lengths / costmap resolutions that don't have this problem. And add some description in the readme / navigation.ros.org about why it is added. You've made some really fantastic figures here as well, no reason to let those be lost to the void 😉

— Reply to this email directly, view it on GitHubhttps://github.com/ros-planning/navigation2/issues/3172#issuecomment-1241361141, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AWR4NSN7PHDMMXEXYIKXDJ3V5J7MRANCNFSM6AAAAAAQFIMX44. You are receiving this because you authored the thread.Message ID: @.***>

SteveMacenski commented 2 years ago

most angle bins aren't visited at all, what is it that you see as a downside of reducing the amount of bins

If you're OK with only using 16 bins or whatever number that resolves to - nothing. But you, like me I suspect, don't think it should be the role of special numbers in turning radii and costmap sizes to determine what fidelity a robot's behavior should be able to achieve. Further, it will cause problems in confined spaces where you need more fidelity to find a route through the space. This is kind of where that 72 bin number comes from - if you can't find a route through a space with 5 deg bins, any less probably isn't going to do it either. But the same cannot be said for 10+ degrees (and certainly not 22.5 with 16 bins).

I mean we could have an analytical expansion to the exact goal coordinate and not to a discretized one

That's only for the last few meters on approach - not used in full planning. The majority of the plan would still be restricted to those 16 bins, which would be problematic especially in narrow spaces where we require more finite motion primitives to be able to find a route through. But also in situations where you have maps and start poses non-evenly aligned. The same reason why in this comment you added +/-2 primitives so that you could make it through an odd-oriented hallways without zig-zags. You need more bins.

~32 bins @ 11.25 (lets say approximately 10 deg for orders of magnitude) is probably entirely suitable for wider spaces without other things like people, robots, or boxes partially blocking aisleways. But the moment you have some really narrow spaces, even if only due to temporary obstacles, having more discretizations gives you the ability to try from more granular directions to find a valid path through if the easy "straight" doesn't work. From the research I did on this during development, 5 deg (72 bins) was that sweet spot. But 10 deg would work as long as there's always a wide berth like in the map you show. Add some other robots or boxes in the way, and I think you'll see it starts to become somewhat problematic in isolated situations -- but when replanning at 1 hz or when operating with hundreds of robots, isolated situations happen quite frequently.

SteveMacenski commented 2 years ago

Thoughts? :smile:

mangoschorle commented 2 years ago

From these new results - what do you suggest as the move now?

I'd be interested in some performance testing on whatever you suggest relative to what's there now for a handful of situations in both confined spaces (like what you show) and open-area maps.

I think that one should really make sure that this is not just happening due to the possibility of me regarding some isolated weird corner cases (I really think that that is not the case but...). Are there some test maps I could have a go at? Like the 3-planners one featured in the smac planner directory.

I guess the flag to switch on/off the new behavior is a good idea especially since you then could directly compare the quality of the paths and the required runtime.

On my side, I only took a look at Dubins paths but I guess the extension to RS should be straight forward. What I had to modify on my end:

If I only could make it to ROSCon ... A discussion on this topic could bring up so many fruitful cool ideas ...

SteveMacenski commented 2 years ago

Are there some test maps I could have a go at?

What's your email? I can send you a couple. I use the willow garage map for confined space testing and another warehouse map I found at some point for open-space testing. I can't find the links to them quickly to post here. Optionally, my email is on my github page if you want to email me first (if you don't want to post your contact info on github ha).

Mhm, that's alot -- I'd really appreciate to be able to see specifically what you did for these experiments. Any chance this could be open-sourced or even just have a branch publicly available as the basis for me to continue working on this? That's alot of reinventing the wheel for me and time is generally limited. It'd be better for me to wait on the open-source process if you think it'd come through. Especially if you're fixing other subtle bugs.

But I guess that still leaves the main question: what action do you recommend from your testing? Prims at every possible angular bin from a node? Just +/-1 or +/- 2 (or both)? Expanding dozens each node seems wasteful of memory consumption of the graph, but if that meaningfully and tangibly improves paths, that's worth me evaluating. If +/-1 and 2 bins does that same result more or less, that seems the most reasonable to me since it only expands the primitive set from 3 to 7, which is still palpable (for example, the lattice planner typically has 5 and has essentially identical runtime to Hybrid-A*, although its primitives are much longer). Or perhaps modes to support either or?

mangoschorle commented 2 years ago

Regarding the maps I sent you a linked in pm.

Prims at every possible angular bin from a node? Just +/-1 or +/- 2 (or both)?

I’ll make some experiments soon and give an update.

mangoschorle commented 2 years ago

This is the path. I took way too long to plan (214 ms). I am not happy. Its a good example though, as it is not axis-aligned.

I don't want to dive into the next topic, but somehow I think that the obstacle heuristic is a bit of an issue. It is aligned with the coordinate system. The map however may not be. I have the feeling that this may introduce an unfair bias towards direct North South / West East expansions. Its more likely to end up in a cell with lower heuristic cost if a primitive goes directly up or directly to the right. Especially since the heuristics map is downsampled by a factor of 2. Does that make sense?

I've also noticed, that the early stages of the tree expansion seem crucial. If it sets off at a skewed angle it usually doesn't recover in the rest of the expansion phase, but expands towards the goal at the cost of zig-zagging.

image

SteveMacenski commented 2 years ago

That's unusually longer than I find, typically < 100ms (between 30-80ms depending on the specific nature of the request).

Lets stay focused on the first thing on the primitives using all the bins for zig-zags. The heuristic is a whole other subsystem and we should button up one issue before moving onto another.

It is aligned with the coordinate system. The map however may not be. I have the feeling that this may introduce an unfair bias towards direct North South / West East expansions.

I'm not sure that's entirely true, but some element of it could be. But, the actual primitives are not in any way related to the costmap 2D grid structure, its just an exploring tree of curves. It will of course try to incentivize the expansions with the lowest costs - if this was just a pure-grid-search algorithm visiting its neighboring cells with a wide hallway (but still a hallway), you might expect the planner to produce a path with many cell-by-cell staircase oscillations to try to stay in the center of the space where the heuristic is minimized (when the cost penalty is set very high). Since we sample with curves in Hybrid/Lattice, you could expect something similar with little waves. But the turning paths have penalty functions on them that can be tuned (or removed) to help incentivize a bit more straight motion or when turning, committing to those turns. Or reducing the cost penalty a little so that we really care more about minimal length paths than staying in the center of aisles. Reducing the weighting on cost in the travel cost functions and heuristics would make it care less about the absolute optimal dead-center aisleway and remove a great deal of those artifacts. Keep in mind my default tuning is supposed to give a good baseline for all applications, not that its optimal for any particular one (and not to say I perfectly tuned the system for the 4-5 cases I tuned based on).

Also keep in mind that just because the heuristic is represented by a 2d grid, that doesn't mean that our primitives hit every cell ( since $>\sqrt(2)$ nor that the search neighborhoods are N/E/S/W. There's more analysis we could do here, but I don't believe that changing the heuristic to use, say, 8-connected neighborhoods vs 4-connected neighborhoods would actually make a difference. But it really wouldn't be super hard to test, I think you'd have to change only about 6 lines in this block https://github.com/ros-planning/navigation2/blob/main/nav2_smac_planner/src/node_hybrid.cpp#L473-L520 to make the neighborhood and travel cost use 8-connected neighborhoods in expansion rather than 4. If you find that helps + doesn't impact compute too much, seems like an easy improvement.

Especially since the heuristics map is downsampled by a factor of 2.

You are right, though, that the downsampling does create a level of approximation - that is a real effect. I found though using the simple smoothing algorithm that the planner plugin uses removes those artifacts pretty well (the small waves, so to speak). The penalty functions also help here a bit, but if you increase those costs for non-straight/turning/etc you can create very rigid zig-zaggy behavior (since if you make changing direction so difficult, it won't until it absolutely has to). This is why those penalty functions are set super low (like 1%).

I've also noticed, that the early stages of the tree expansion seem crucial. If it sets off at a skewed angle it usually doesn't recover in the rest of the expansion phase, but expands towards the goal at the cost of zig-zagging.

I think that's a bit of a misnomer, its not that the early expansion matters, that expansion is deterministic is only based on the boundary conditions (e.g. location of the robot, environment, etc) and it will always expand in the way the heuristics and specific available primitive functions enable. I thought we covered that the zig-zagging was largely due to not using the bins properly as we've been discussing above? If you're starting at an "odd" angle and the map is at a "normal" angle, then the restricted primitive set you have available to you can't do straight down spaces, causing the zig-zagging. That seems sensible. It wouldn't surprise me if the heuristic contributed a little bit of the in the localized optimal zones, but wouldn't be all over the map like what you've shown. Those would be little-bitty ones.

SteveMacenski commented 2 years ago

Hey, any updates here? I'd love to be able to work on this sooner than later to have it resolved! I think I was looking for your suggestion about what we should do here to maximize the use of the primitive space (e.g. add +/- 1 bin prims, and/or +/- 2, and/or ...) given your extensive experimentation.

mangoschorle commented 2 years ago

Nothing new here. I should be able to invest more time over the next couple of weeks. I am still not 100% happy. Especially in maps that are not well aligned. I am on it and reach out asap

stevedanomodolor commented 1 year ago

Hey, I have been working on the problem and trying different solution proposed here and also some input from @SteveMacenski. I used the following parameters for testing:

planner_server:
  ros__parameters:
    expected_planner_frequency: 20.0
    planner_plugins: ["GridBased"]
    GridBased:
      plugin: "nav2_smac_planner/SmacPlannerHybrid"
      downsample_costmap: false           # whether or not to downsample the map
      downsampling_factor: 1              # multiplier for the resolution of the costmap layer (e.g. 2 on a 5cm costmap would be 10cm)
      tolerance: 0.25                     # dist-to-goal heuristic cost (distance) for valid tolerance endpoints if exact goal cannot be found.
      allow_unknown: true                 # allow traveling in unknown space
      max_iterations: 1000000             # maximum total iterations to search for before failing (in case unreachable), set to -1 to disable
      max_on_approach_iterations: 1000    # Maximum number of iterations after within tolerances to continue to try to find exact solution
      max_planning_time: 5.0              # max time in s for planner to plan, smooth
      motion_model_for_search: "DUBIN"    # Hybrid-A* Dubin, Redds-Shepp
      angle_quantization_bins: 64         # Number of angle bins for search
      analytic_expansion_ratio: 3.5       # The ratio to attempt analytic expansions during search for final approach.
      analytic_expansion_max_length: 3.0  # For Hybrid/Lattice nodes: The maximum length of the analytic expansion to be considered valid to prevent unsafe shortcutting
      minimum_turning_radius: 0.50        # minimum turning radius in m of path / vehicle
      # maximum_turning_radius: 1.0
      reverse_penalty: 2.0                # Penalty to apply if motion is reversing, must be => 1
      change_penalty: 0.0                 # Penalty to apply if motion is changing directions (L to R), must be >= 0
      non_straight_penalty: 1.2           # Penalty to apply if motion is non-straight, must be => 1
      cost_penalty: 2.0                   # Penalty to apply to higher cost areas when adding into the obstacle map dynamic programming distance expansion heuristic. This drives the robot more towards the center of passages. A value between 1.3 - 3.5 is reasonable.
      retrospective_penalty: 0.015
      lookup_table_size: 20.0             # Size of the dubin/reeds-sheep distance window to cache, in meters.
      cache_obstacle_heuristic: false     # Cache the obstacle map dynamic programming distance expansion heuristic between subsiquent replannings of the same goal location. Dramatically speeds up replanning performance (40x) if costmap is largely static.
      smooth_path: True                   # If true, does a simple and quick smoothing post-processing to the path

I was able to replicate the issue that during planning some of the bins are skipped. Also, out of curiosity, I would love to know how the turning radius 0.5 was converted to 10?

[planner_server-2] bin_size: 0.0981748
[planner_server-2] angle: 0.19635
[planner_server-2] min_turning_radius: 10
[planner_server-2] num_angle_quantization: 64
[planner_server-2] increments: 2
[planner_server]: Planner loop missed its desired rate of 20.0000 Hz. Current loop rate is 18.2578 Hz

Figure_1_case_study_1 The path generated during this planning: path_1_case_study_1 In order to tackle the problem, I tried generating more projections. In this case the increment was +-2, so I generated projection for +-1. Based on the increment, I recomputed the angle. Also, i recomputed the turning radius using a fix chord length of sqrt(2). Although, it seem that the problem of the bin skipping was reduced, the computational time way way higher. There might be some optimization techniques that can be done to reduce thus.

[planner_server-2] bin_size: 0.0981748
[planner_server-2] angle: 0.19635
[planner_server-2] min_turning_radius: 10
[planner_server-2] num_angle_quantization: 64
[planner_server-2] increments: 2
[planner_server-2] delta_x: 1.41251
[planner_server-2] delta_y: 0.0693922
[planner_server-2] angle_increment: 0.0981748
[planner_server-2] turn_radius: 14.4108
[planner_server-2] i: 1
[planner_server-2] delta_x: 1.4074
[planner_server-2] delta_y: 0.138618
[planner_server-2] angle_increment: 0.19635
[planner_server-2] turn_radius: 7.21411
[planner_server-2] i: 2
[planner_server-2] [WARN] [1679774605.351331509] [planner_server]: Planner loop missed its desired rate of 20.0000 Hz. Current loop rate is 2.3250 Hz

Figure_2_case_study_2 path_2_case_study_2

For the last case, I tried using non-exact increments but the skipping behavior still persisted. It even took a bit longer to compute.

[planner_server-2] bin_size: 0.0981748
[planner_server-2] angle: 0.141539
[planner_server-2] min_turning_radius: 10
[planner_server-2] num_angle_quantization: 64
[planner_server-2] increments: 1.44171
[planner_server-2] [WARN] [1679775659.963246199] [planner_server]: Planner loop missed its desired rate of 20.0000 Hz. Current loop rate is 14.5725 Hz

Figure_3_case_study_2 path_3_case_study_3

Also, the link provided in the code to the theory is not working(http://planning.cs.uiuc.edu/node821.html).

SteveMacenski commented 1 year ago

Hi! Great analysis starting point here!

I would love to know how the turning radius 0.5 was converted to 10?

I assume its because the turning radius is 0.5m at a 5cm resolution, which would be 10 grid cells. You could verify this by changing the resolution to, say, 10cm and then the radius should be 5.

Although, it seem that the problem of the bin skipping was reduced, the computational time way way higher

That is... odd. I've have to see what you did. Its probably that you computed something incorrectly so the heuristics or traversal costs are poorly estimated. Unless you're computing those primitives in a planning loop (which I doubt you did, if you computed them where we compute the other primitives).

I tried using non-exact increments but the skipping behavior still persisted.

That doesn't overly surprise me to some extent. I'm suspicious of the numbers on the plot you showed (why are they all ending in 0.12 if the angles aren't exact?). You should just be able to run that experiment by removing the ceil here. I'd like to understand what you did here in more detail to make sure its reasonable, but I think the conclusion you came to is generally in line with my expectations.


So the next questions would be: How do we find the number and increments of the primitives we need to add to fully leverage the bins specified by the user? For the particular example you say, it is every 2nd bin was not usable. But you could easily set a set of parameters that its every 3rd or 4th or 5th. We need to be able to find that out when we're generating the primitives and create a set of primitives to more fully explore the space available to us. Though, perhaps first things first is to find the source of that additional planning time so we can scale it up!


Also, an update from @mangoschorle would be fabulous!

stevedanomodolor commented 1 year ago

That doesn't overly surprise me to some extent. I'm suspicious of the numbers on the plot you showed (why are they all ending in 0.12 if the angles aren't exact?). You should just be able to run that experiment by removing the ceil here. I'd like to understand what you did here in more detail to make sure its reasonable, but I think the conclusion you came to is generally in line with my expectations.

Yes I did this like this. For the non exact bin, I just removed the ceil, while in order to add more bins I did as follows:

  for(int i = 1; i <= int(increments); i++)
  {
    angle_increment = i * bin_size;
    turn_radius = sqrt(2) / (2.0f * sin(angle_increment / 2.0));
    delta_x = turn_radius * sin(angle_increment);
    delta_y = turn_radius - (turn_radius * cos(angle_increment));
    projections.emplace_back(delta_x, delta_y, i);  // Left
    projections.emplace_back(delta_x, -delta_y, -i);  // Right
    std::cout << "delta_x: " << delta_x << std::endl;
    std::cout << "delta_y: " << delta_y << std::endl;
    std::cout << "angle_increment: " << angle_increment << std::endl;
    std::cout << "turn_radius: " << turn_radius << std::endl;
    std::cout << "i: " << i << std::endl;

  }
  projections.emplace_back(hypotf(delta_x, delta_y), 0.0, 0.0);  // Forward

That is... odd. I've have to see what you did. Its probably that you computed something incorrectly so the heuristics or traversal costs are poorly estimated. Unless you're computing those primitives in a planning loop (which I doubt you did, if you computed them where we compute the other primitives).

I actually I am not computing it in a loop but during the initialization step.

So the next questions would be: How do we find the number and increments of the primitives we need to add to fully leverage the bins specified by the user? For the particular example you say, it is every 2nd bin was not usable. But you could easily set a set of parameters that its every 3rd or 4th or 5th. We need to be able to find that out when we're generating the primitives and create a set of primitives to more fully explore the space available to us. Though, perhaps first things first is to find the source of that additional planning time so we can scale it up!

Could the problem be that I did not adjust other parameter affecting the cost?

SteveMacenski commented 1 year ago

OK on the non-exact bins. But then why do they all end with X.12? If its not exactly 1.0, they should be more distributed (e.g. 1.4 increment would give you 0, 1.4, 2.8, 4.2, ...)? That seems to point to an error. The topic of non-exact bin sizes are independent from adding multiple primitives, make sure you're not mixing these together.

For the primitives, you need to make sure each of the primitives are the same length so the traversal cost between them are the same (or adapt the traversal cost function to account for the fact that these aren't all the same length). See the thread above in this ticket with @mangoschorle and the State Lattice planner. What is increments? Do these primitives look reasonable? While my head isn't in this problem in particular at the moment, this seems overly simplistic and not accounting for practical constraints.

For instance, https://github.com/ros-planning/navigation2/blob/main/nav2_smac_planner/src/node_hybrid.cpp#L91 this has the minimum turning radius, you can't willy-nilly change that. That's a real user constraint. It has to be higher than that, never lower. You use angle_increments which at my first glance seems nonsensical making me think these primitives are wacky.


The ideal requirements for the new primitives: They must be at or higher than the turning radius of the car set. They should be the same lengths, and if not, updates to the traversal cost function needs to make sure we correct for that. They must end in other bin anle increments than the full-turning-radius-minimum-distance maneuvers (the default ones). If you set the lengths as fixed based on the default primitives, then that should allow you to back out then the turning radius strictly greater than the minimum radius which ends in the desired angular bin increment. Some engineering drawings may help you visualize this. If you have a fixed length, target angular bin change, and minimum radius, there should be a closed form geometric solution of a primitive of a less extreme turn.

If you know the default primitives have angular bin jumps of 3 (e.g. 3, 6, 9, 12, ...) then you know that the default set will do -3, 0, 3. Which means you need to fill in the 1, 2, -1, 2 angular bin primitives. Once you know how to generate the closed form geometric equations, then you can plug in these values for generating the appropriate primitives to fill in the search space.

stevedanomodolor commented 1 year ago

OK on the non-exact bins. But then why do they all end with X.12? If its not exactly 1.0, they should be more distributed (e.g. 1.4 increment would give you 0, 1.4, 2.8, 4.2, ...)? That seems to point to an error. The topic of non-exact bin sizes are independent from adding multiple primitives, make sure you're not mixing these together.

You are right, it should give the increments of 0, 1.4 and so on. The problem was the way I plotted the histogram, specifically the way I specified the bins. Figure_1

With respect to the second point, I will try it and see and let you know

stevedanomodolor commented 1 year ago

Let me know if my reasoning is wrong, but I think the result s is quite promising. In order to generate the primitive, following your advice, I computed the chord length based on the default primitive(max left, max right and forward) which was computed from the min turning radius and angular bin(-2,0,2). With that, I modified the travelcost based on this new chordlength, which is >= sqrt(2) depending on the increments value. Then after this compute the other primitives based on the remaining angular bins. In this case -1, 1. response_primitive

More info [planner_server-2] bin_size: 0.0872665 [planner_server-2] angle: 0.174533 [planner_server-2] min_turning_radius: 10 [planner_server-2] num_angle_quantization: 72 [planner_server-2] increments: 2 [planner_server-2] Increments[ 1 ] Radius: 19.981 [planner_server-2] projections[0]: 1.74311, 0 ] [planner_server-2] projections[1]: 1.73648, 0.151922 ] [planner_server-2] projections[2]: 1.73648, -0.151922 ] [planner_server-2] projections[3]: 1.74146, 0.0760326 ] [planner_server-2] projections[4]: 1.74146, -0.0760326 ]

The generate path and the histogram response response_graph

Generally speaking, it does covers the remaining angular bins and jumps only occurs at the extrem of the histogram. For comparison, we can see the case were we do just use the default motion primitives:

[planner_server-2] bin_size: 0.0872665 [planner_server-2] angle: 0.174533 [planner_server-2] min_turning_radius: 10 [planner_server-2] num_angle_quantization: 72 [planner_server-2] increments: 2 [planner_server-2] projections[0]: 1.74311, 0 ] [planner_server-2] projections[1]: 1.73648, 0.151922 ] [planner_server-2] projections[2]: 1.73648, -0.151922 ] response_default response_primitive_default response_graph_default

It would be nice if I could have a map where I could reproduce the zig zag behavior exactly as @mangoschorle did, to really test if this really does work.

SteveMacenski commented 1 year ago

Promising indeed!

With that, I modified the travelcost based on this new chordlength,

If you used the chord length from the default primitives, why did you need to change the travel cost if they're all the same length? Or are they not the same length...?

Then after this compute the other primitives based on the remaining angular bins

You show the case with +/-1, does the method you use work for arbitrary ones (e.g. if we do every 3rd, will it generate +/- 1 and 2?)? Thoughtfull, I think it could be +/- 2, 3, 5, 7, ... Though, realistic settings are probably mostly in the 2/3/5 range.

Are the run time issues fixed now? @mangoschorle even addressed that his changes decreased planning times in some situations due to having more granular options to follow the guiding heuristics.

That path looks much improved!

jumps only occurs at the extrem of the histogram

Those aren't jumps anymore, those are just where it was never adventitious for the system to explore solutions that those heading (e.g. the "going right" or "going up" directions where the route is down / left). If you see a reasonable band of dense results filling in the known jumping rate, then we can explore them all (since the primitives are all the same for all orientations).

t would be nice if I could have a map where I could reproduce the zig zag behavior exactly

You can on that map, its just finnicky to get the right initial / goal headings to cause it. Try doing some stuff where you request the starting heading off-axis but a little bit.

This is really great progress!

stevedanomodolor commented 1 year ago

Sorry for the late response, I was quite busy these weeks with work. I will try to push this forward asap.

You show the case with +/-1, does the method you use work for arbitrary ones (e.g. if we do every 3rd, will it generate +/- 1 and 2?)? Thoughtful, I think it could be +/- 2, 3, 5, 7, ... Though, realistic settings are probably mostly in the 2/3/5 range.

Yes, the idea of the implementation is that it included all increments lower than the one computed. For example for +-4, we compute 0 +-1,2,3. You can see the implementation here-> https://github.com/stevedanomodolor/navigation2/tree/zig_zag_issue

Are the run time issues fixed now? @mangoschorle even addressed that his changes decreased planning times in some situations due to having more granular options to follow the guiding heuristics.

Based on my implementation, the processing time is always higher because we have larger search space. Maybe there is somthing I am not consdering.

You can on that map, its just finnicky to get the right initial / goal headings to cause it. Try doing some stuff where you request the starting heading off-axis but a little bit.

Some bad news is that my method only works when the starting angle is aligned with next motion of the robot. Like the picture I sent you earlier. But when the starting angle varies, the behavior still remains. For example, using the starting angle below, Screenshot from 2023-05-02 20-09-26.

I will try changing the travel cost based on the arc length of the motion primitve as @mangoschorle did to see the effect.

SteveMacenski commented 1 year ago

Great! Thanks for the update!

SteveMacenski commented 1 year ago

Just a note, I'm starting to work on this now and have evaluated our options in detail. I want to detail it here, even though I know some of this is redundant to previous elements of the conversation.


We have a problem where when the angular quantization bin jumps are multiple increments (e.g. 0,2,4,6 or 1,3,6,9), we skip some parts of the user requested resolution for use. This is caused by the relationship of costmap resolution and minimum turning radius needed to ensure we leave a cell.

Our two options are to either (A) make primitives that have their angular bin changes be exactly 1 bin or (B) create more primitives that cover the range of skipped bins.

(A) If we do this, then the angular bin constraint might be met, but then we will have spatial overlap requiring bin deconfliction. In this situation, we can have multiple short primitives that land into the same bin that we need to deal with. For turning motions, they'll be +/- 1 angular bin from the origin, but for straight motions they'll be exactly overlap. We can deconflict them, but then we'll proportionately increase the number of iterations and thus collision checks depending on the primitive length. If we skipped every second orientation bin, we'd need to reduce the spatial distance of each primitive by ~2. So we'd have more collision checks proportionate to 3x the skipping ratio (for the 3 new primitives with double the expansion)

(B) We add additional primitives that fully cover the skipped space, which means adding 2(N-1) new primitives, where N is the angular quantization jump. We can directly compute these primitives using the same equations used for the minimum turning radius curve, but this time with a known angle but unknown turning radius (with constant chord). We can then use that turning radius to find the x,y,yaw for the primitive pretty straight forwardly. This however adds 2(N-1) new collision checks. I have the math done here as a proof in my notebook, but suffice it to say it aligns with other code blobs posted here (just validated) so no need to repeat that again.

Either way, I think that broadly makes the added costs of compute on collision checking equal. (A) is doing N 3 (num of primitives) and (B) is doing 2 (N-1). For N = 2,3,7 -- (B) is better and honestly preferable logically to not deal with annoying deconfliction logic that also has the negative of making the path's poses no longer evenly spaced without other interpolation logic. This is also what other implementations (like MATLAB) do for this reason as well.


An interesting experiment came to mind to try only adding the +/- 1 primitive searches. For the case of even jumps every 7 bins, it would take then at most 5 expansions to be able to reach any given part of the skipped portion. While it wouldn't be possible everywhere, you'd be able to follow the heuristic functions within a relatively short period and opens up the full exploration space, somewhat. I'm not sure its a great, best solution, but I'm interested to explore it briefly.


@mangoschorle you mention some rounding issues in https://github.com/ros-planning/navigation2/issues/3172#issuecomment-1249118410.

Modified the getProjections. The way you currently use your precomputation of delta_xs and delta_ys causes rounding issues. I realized that once I wrote an interface which visualizes the graph. These rounding issues cause that the primitives frequently don't expand as expected but you get these asymmetrical primitive expansions. I do think that the round off errors in getProjections are the only reason you sometimes fall into odd bins.

Can you expand on that a bit? The angular bins are always exact valued, so there's no partial fraction of a bin to cause us to go into another subset of bins. The expansions use the bin numbers (e.g. 1,2,3,...) not the angle, so there's really no chance (?) of that kind of error. The raw angle is really only used for collision checking - though I only did a quick scan through so my memory could be missing something that is worth you pointing out if there's an issue

We would have to also modify the travelcost computation. Maybe we should move this into the nodes rather than have it in astar, where we have to rely on indices and the oreder in which we emplaced the primitives.

Our motion movement should still be sqrt(2) on the circle if we keep the sqrt(2) chord length constant. That's the distance of transit, so I don't think the travel cost function needs to be updated at all? Keep in mind these are distances on the curve (e.g. chord) not the actual spatial hypot distance necessarily. Along the curve is the "actual" travel length


In progress branch: https://github.com/ros-planning/navigation2/tree/3172

mangoschorle commented 1 year ago

This was a long time ago. So it is hard to wrap my head around this. I remember that I tried a lot of approaches including the two you mentioned to come up with a generic solution for the aforementioned issues.

The result was the following: I was able to get very good paths for scenarios, where the corridors and hallways are nicely aligned with the cell grid. The core problem that remained and which I finally gave up on was that there seemed to be a strong "bias" for the planner to prefer motions into the W N S E cells. For maps that are arbitrarily oriented and don't align nicely this will always result in sawtooth/stepping paths along a diagonal hallway.

Ideally, we would want the planner to be invariant to rotations of the map. One could nicely create a unittest for that and follow a test-driven implementation. The paths should coincide up to a rotation and translation around the start of the path.

Regarding the projections: If I can remember correctly the issue of unsymmetrical expansions had to do with the way you compute the deltas. This here is based on a quantized angle.

You add to the projection list the new_heading, but you compute the delta_x and delta_y based on quantized headings which shoud rather be done based on the actual heading in new_heading. The center primitive is along the continuous part of the heading new_heading the left and right primitives are non-symmetrical if the new_heading istn't perfectly aligned with the angular bins, which it never is. If new heading isn't perfectly in the middle of delta_xs[i][node_heading] + node->pose.x, delta_ys[i][node_heading] + node->pose.y, new_heading)

I suggest something like this float new_x = node->xcell + projections[0]._x cosangs.at(static_cast(new_heading)); float new_y = node->.ycell + projections[0]._x sinangs.at(static_cast(new_heading));

If you don't believe me plot your tree. You will see that the forward primitive isn't always centered between the left and right primitive expansion.

Another thing that I realized: The planner time is heavily dominated by the amount of expansions it takes and not by the amount of primitives. I wouldn't worry about amount of primitives so much. In my experiments, as soon as stepping/zig-zagging was avoided, the amount of expansions required to find the goal dropped by orders of magnitude. If it requires 11 Primitives to avoid zigzagging, you'll still be faster than using 3 primitives.

SteveMacenski commented 1 year ago

I'm trying to parse this to make sure I understand your point, so please bare with me:

You add to the projection list the new_heading, but you compute the delta_x and delta_y based on quantized headings which shoud rather be done based on the actual heading in new_heading. The center primitive is along the continuous part of the heading new_heading the left and right primitives are non-symmetrical if the new_heading istn't perfectly aligned with the angular bins, which it never is.

I'm trying to figure out what you mean by this. Do you mean in the primitive construction in initDubin or in its live use in getProjections? I think you mean getProjections so I'm going to talk w.r.t. that.

In the Hybrid-A* algorithm as written, each primitive's ending is exactly aligned with an angular bin, that's the role of the following when we compute the original delta_x/y. The spatial X/Y can land anywhere, but the angles always end up being in terms of the bins. In search, we store and use the angular quantization bins (e.g. 1,2,3,4...) not the angles when we expand (which is why you see projections.emplace_back(delta_x, delta_y, increments);, where increments is in bin numbers.

  if (angle < bin_size) {
    increments = 1.0f;
  } else {
    // Search dimensions are clean multiples of quantization - this prevents
    // paths with loops in them
    increments = ceil(angle / bin_size);
  }

When we get to getProjections, the line I think you're referencing float new_heading = node_heading + motion_model._theta; should all be whole valued bin numbers, represented as floats. If you're saying that that isn't true and that e have some outputs of new_headings that are non-whole numbers, I'd 100% agree we have an issue.

If that's what you're saying, please let me know!

I tried printing the new_heading (std::cout << new_heading << std::endl;) in getProjections and I don't see any non-whole valued numbers, however. Which makes me think perhaps my interpretation of this comment was incorrect.

I suggest something like this: float new_x = node->xcell + projections[0].x * cos_angs.at(static_cast(new_heading));

That would quash the new_heading discretized error down to a whole-value, which would handle the issue indeed. Would you agree though that fixing the issue that new_heading as you see it isn't remaining the whole valued number it should be is the core issue to resolve rather than introducing measures to squash the effects of it? Though, pointing out where you think there are non-exact bin searches would be good, since where I looked above didn't yield it.

I don't argue that you may have found an issue, I'm just not seeing what I think you mean, so please feel free to berate me for looking up the wrong tree :smile:

You will see that the forward primitive isn't always centered between the left and right primitive expansion.

I don't think it should be. Recall that we're traversing on the chord of a curve, not straight-line paths. So the center primitive going straight should be "further" from the origin point because its distance is a perfect line. The turning primitives are traversing the curve of the minimum turning radius circle so its total spatial displacement is less. The three primitive should not exactly align in the center, it should be slightly out ahead.

Unless you mean like aligned symmetrically between right and left, that would indeed point to a problem.


Responses to other items > The planner time is heavily dominated by the amount of expansions it takes and not by the amount of primitives. I wouldn't worry about amount of primitives so much. In my experiments, as soon as stepping/zig-zagging was avoided, the amount of expansions required to find the goal dropped by orders of magnitude. That generally follows for me too, are you using a circular robot model? I'm wondering because more primitives does involve more collision checking, which is the most expensive part of the planning, since we collision check on creating potential neighbors, not on visitation. Though, the cost aware heuristic drives away from obstacles, so that ratio may still be very small to not be super important. Just wondering. > The core problem that remained and which I finally gave up on was that there seemed to be a strong "bias" for the planner to prefer motions into the W N S E cells. For maps that are arbitrarily oriented and don't align nicely this will always result in sawtooth/stepping paths along a diagonal hallway. It should also have diagonal biases given how the heuristics are computed. On this front, I don't think there's much to be done other than (A) adjust the obstacle heuristic neighborhood to be 16 connected rather than 8 or (B) do what many users do and use a path smoother after this algorithm -- which is what the common technique of Hybrid-A* does
mangoschorle commented 1 year ago

Are you testing with a random start orientation? Or with something like 0? Also try an odd number for the angle quant bins. Are you still only seeing whole numbers?

SteveMacenski commented 1 year ago

Are you testing with a random start orientation?

Yeah, I just tried with a few different random headings, that print is still giving me only whole-valued numbers even as floats. Ex

[component_container_isolated-5] 31
[component_container_isolated-5] 25
[component_container_isolated-5] 1
[component_container_isolated-5] 4
[component_container_isolated-5] 70
[component_container_isolated-5] 28
[component_container_isolated-5] 31
[component_container_isolated-5] 25
[component_container_isolated-5] 1
[component_container_isolated-5] 4
[component_container_isolated-5] 70
[component_container_isolated-5] 13
[component_container_isolated-5] 16
[component_container_isolated-5] 10
[component_container_isolated-5] 31
[component_container_isolated-5] 34
[component_container_isolated-5] 28
[component_container_isolated-5] 31
[component_container_isolated-5] 34
...

from

    const float & node_heading = node->pose.theta;
    float new_heading = node_heading + motion_model._theta;

    if (new_heading < 0.0) {
      new_heading += num_angle_quantization_float;
    }

    if (new_heading >= num_angle_quantization_float) {
      new_heading -= num_angle_quantization_float;
    }

    std::cout << new_heading << std::endl;

I tried as well using 71 bins instead of 72 in a few arbitrary headings and still the same output. None of the headings I selected were aligned with the map

mangoschorle commented 1 year ago

I might also be wrong regarding the rounding error. Maybe I was using a messed up version of the planner. If you only see perfectly symmetrical expansions than I wouldn’t spend any more time on it. Could be that I am just wrong.

SteveMacenski commented 1 year ago

I'm going to do some visualizations as part of the additional primitives work, so I'll make sure to have some eagle-eyed attention to current behavior as well to make sure visually everything lines up as well. It might be that as part of your adjustments something got messed up :shrug: But I'll make sure that there's no skeletons I missed hanging around as well.

SteveMacenski commented 1 year ago

https://github.com/ros-planning/navigation2/pull/3752 PR now open, needs primarily testing done (which @stevedanomodolor and I will each do) and then good to go hopefully

SteveMacenski commented 1 year ago

@mangoschorle I very much appreciate your time and energy on debugging and prototyping for this ticket. Sorry it took me so long to eventually get to the implementation but your hard work here is admirable and I very much appreciate the dialog!

mangoschorle commented 1 year ago

getTraversalCost could compute garbage if you add new projections because you have new indices

if (getMotionPrimitiveIndex() == child->getMotionPrimitiveIndex()) { is not the only instance where you keep turning now!

my bet is modulo 2

Von: Steve Macenski @.> Gesendet: Freitag, 11. August 2023 01:21 An: ros-planning/navigation2 @.> Cc: mangoschorle @.>; Mention @.> Betreff: Re: [ros-planning/navigation2] Zig-zag due to unfavorable angle increments and starting angle (Issue #3172)

@mangoschorlehttps://github.com/mangoschorle I very much appreciate your time and energy on debugging and prototyping for this ticket. Sorry it took me so long to eventually get to the implementation but your hard work here is admirable and I very much appreciate the dialog!

— Reply to this email directly, view it on GitHubhttps://github.com/ros-planning/navigation2/issues/3172#issuecomment-1674045296, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AWR4NSMCA7QMEXUVTYVBTV3XUVUJNANCNFSM6AAAAAAQFIMX44. You are receiving this because you were mentioned.Message ID: @.***>

mangoschorle commented 1 year ago

I'm going to do some visualizations as part of the additional primitives work, so I'll make sure to have some eagle-eyed attention to current behavior as well to make sure visually everything lines up as well. It might be that as part of your adjustments something got messed up shrug But I'll make sure that there's no skeletons I missed hanging around as well.

I messed up. I just checked again.

On a side note. If we are sure that it is always a whole number, float is probably not the right data type here. I am talking code clarity, not performance.

SteveMacenski commented 1 year ago

getTraversalCost could compute garbage if you add new projections because you have new indices

Thanks! I made a note of that on my TODO list!

I messed up. I just checked again.

:+1: Sounds good!

float is probably not the right data type here

I don't disagree, it wasn't always the case that this was going to be promised to be exact angular quantizations. It used to allow for partial bins (this is way back when I was doing the original development) but had various problems so I made them exact bins and adjusted how the primitives were computed to promise that. I'll take a look at that, it might still make sense based on all the conversions or math its being used in, but it also might not.

SteveMacenski commented 1 year ago

@mangoschorle w.r.t. your comment;

The planner time is heavily dominated by the amount of expansions it takes and not by the amount of primitives. I wouldn't worry about amount of primitives so much. In my experiments, as soon as stepping/zig-zagging was avoided, the amount of expansions required to find the goal dropped by orders of magnitude.

I agree, though this is not the behavior I'm seeing right now with my PR branch #3752. I'm finding that more primitives are taking more time to compute paths than without. I've been looking into it without a clear solution but if you had a chance to take a look, I'd appreciate the external review to see if there's something obvious to you I'm doing wrong. I've updated the traversal cost function to consider the additional primitives. Your diagrams in this comment also convince me that this should probably be faster, not slower, so I'm thinking something is wrong - but not 100% sure what.

When you noted that reduction in expansions, is that considering using a different setting for the number of angular bins (even though previously not fully used)? I suppose considering 72 bins with a 0.4m turning radius was using every 3rd bin before, that would equal 24 bins of practical use of the new method able to achieve all. Those times I suppose approximately line up with old-72 vs new-24. But I see substantial differences when using the higher resolution of bins needed to create smoother paths -- 64/72 as previously set. With slight tweaking to non_straight_penalty and 32 bins, though, I can see similar levels of run-time performance (which might be similar to what you ended up doing?).

I'm thinking it might have to do with retuning the defaults for the different penalty functions (non-straight in particular), but if there was other thoughts that crossed your mind that isn't just tuning but actual potential issues, I'd really love to know! Or if you tuned other settings to get that performance from what we have currently as defaults, that would be illuminating (or if you landed on a reduced setting like 32 instead of 64/72 because that did have a higher compute cost like I'm seeing)

I'm hoping to have this merged in before the end of the month (ideally end of this week if I can figure out this one last thing :-) )! Hopefully you can use the mainline version again which automatically computes this generically!

SteveMacenski commented 1 year ago

I have this data in a not-so-easy to share format but the trends I'm seeing:

Most of the parameters won't make a real difference in the planning iterations except the penalties. Since we have more primitives, it might be worth re-considering the defaults of these, especially the non-straight and change penalties (reverse and retrospective shouldn't really change I don't think).

Beyond that (A) retuning and (B) potentially re-working the traversal function computation, I think the only other thing under our control to reduce the number of iterations is purely that the primitives themselves (C) aren't necessarily the best of ideas (which I think is probably not true) or (D) are not implemented properly to speed things up instead of slowing them down or (E) something else in the planner was not yet accounted for that needed to have changed, like heuristic admissibility with changes.

Example, left is with the setting off and right with the setting on with 72 bins and all setting equal - we have clearly more expansions in the general path area while finding the solution. These are only the expanded nodes (not just visited), so I'm surprised to see so many being expanded from each batch - possibly because they have virtually the same cost as really close cousins?

Screenshot from 2023-08-15 13-24-51

Edit:

Decreasing nonstraight penalty to not be applied does reduce the number of iterations, but still many, many more and reduces path quality.

This may be to a major degree just having to do with tuning for a particular application (e.g. cost awareness, speed, costmap inflation settings, etc) with this new setting activated. Though, I definitely see apples-to-apples of the same settings this being a multiple-of or order-of-magnitude (depending on length) more than the previous settings. I can get it down by messing with params, but that trend still holds that the original method is faster even when I can get the "on" mode to be acceptable.

Its still slower / more iterations which is different than @mangoschorle's trend analysis - but I think that somewhat has to do with the nature of the map / tests. I'm doing it on an open map of random obstacles which doesn't have as much of a "clear follow this line" as in hallway like situations. Even without changing the penalty / parameter values in this kind of environment, I see the runtimes to be equal in terms of iterations, or even slightly less -- and indeed without zig-zagging! But its not consistently a benefit across all real maps I'm testing against. Others show it still being an order of magnitude more iterations with most (if not all) primitives from a visited node's batch are expanded (presumably due to similar costs?).

This might just be up to interpretation about the type of environment you're in about whether this actually helps. But - I'd still really like a review on this work to make sure I'm not rationalizing an actual error that is resolvable! If you're open to testing this branch too @mangoschorle to give feedback, I'd very much appreciate it to know if there are notable performance differences!

SteveMacenski commented 1 year ago

I'm playing around with the computation of the travel cost for the curves. Right now, we have a static cost of travel_distance_cost for all primitives, since they're all spatially moving by that much. However, that makes turning and straight moves have the same cost, relying on the turns that just result in longer paths to prune those branches. Since the robot is supposed to be following the turning radius' arc not the circle's chord we can increase this to additionally penalize turning that is representative of the actual distance traveled. I'm finding that minor update helps alot with minor wobbling in the solution space - especially when paired with the upsampled primitives we're discussing in this ticket.

This seems to increase path quality a little, but a tad bit different in behavior. It also seems to be consistent with the same order of magnitude for most cases and occasionally reduces it by one. I'm not yet integrating all the penalty functions, but will tomorrow

mangoschorle commented 1 year ago

Hmmm but isn’t this equivalent to increasing the non-straight penalty?

SteveMacenski commented 1 year ago

Hmmm but isn’t this equivalent to increasing the non-straight penalty?

In a way, but its a real physical quality that isn't just a tuned parameter. I've been testing to get those results with all the penality functions turned off. In some cases, it resulted in many less iterations since a couple of primitives turning back-and-forth are now longer than just a straight line, so the quality is also higher and less wobbly. Now, the wobbling is mostly due to the heuristics in that the primitive set isn't granular enough to exactly march down the heuristic exactly (e.g. its moving at an angle the current primitive set cannot model exactly)

SteveMacenski commented 1 year ago

Another update:

I updated the heuristic computation to also use 16 connected neighborhoods instead of 8 and I notice that the heuristic is now well under 1% of error wrt the actual path costs -- before it was ~1% or more on longer paths. This is now a much better estimator. This slows things down a bit more for reasons I think I understand due to the potential field similar costs but I don't know for certainty to propose it here as fact. When I add some margin to the obstacle heuristic of 1% to make sure it stays admissible, I get a speed up again, but that's not what I want to rely on permanently (this is analog to the retrospective penalty set to 1.5% by default in the traversal cost function -- which I've temporarily disabled during my underlying systems tests, so I'll probably just reintroduce that for those that want it).

Now, I need to work on steering the search more clearly towards the heuristic gradient and prune branches more aggressively to cut down on the iterations without adding margins. I think that would be the last thing before this PR is mergable. When there are lots of search iterations, you can see them pile up and I should be able to find a way to differentiate costs better so that we naturally prune more of these -- whether its via the heuristic creating a deeper canyon or the traversal cost functions having the ability to create a larger spread of costs

I've also un-downsampled the heuristic costmap, so I'll plan to make that a parameter. With the non-downsampled costmap and 16 connected neighborhood, you should see less reliance on the map's main axes @mangoschorle in exchange for longer compute times since the cost-aware heuristic will take some more time to precompute each call if you don't cache it

SteveMacenski commented 1 year ago

Woah - I think I just found the solution. Setting the cost to a quadratic function really improves things... alot. Both fast, really high quality, and involves no hacks or penalty functions.

I will need to investigate this more next week, but I see a solid path forward here and it really improves the performance and consistency of the planner greatly, which I think @mangoschorle would appreciate :-)

My branch is currently a huge mess of R&D but I'll push something hopefully by EOD Tuesday for you to take a look at, I think you'll be pleasantly surprised.

mangoschorle commented 1 year ago

I am on holiday currently. I‘ll share my thoughts in more detail once I am back. I am looking forward to test your findings on our benchmark set.

stevedanomodolor commented 1 year ago

I will also try it during the weekend also, seems promising

SteveMacenski commented 1 year ago

Quick update: I've decided that adding an option for non-downsampled obstacle heuristic + quadratic form cost handling. I'd just make them default, but they manifestly change the performance and tuning of the planner so just replacing them will throw people's systems out of wack.

I've decided to ditch the 16 neighborhood within the obstacle heuristic. It does better align with the actual traversal costs but it comes at a big computational cost. I don't think the marginal improvements in modeling though justify the additional compute time and from testing over a large number of plans, I don't see substantive differences worth the compute time either (after that light post-smoothing in the planner)

So, adding the option for interpolated primitives, quadratic cost functions, and non-downsampled costmap options are what I'm thinking currently. I need to do the glue to make these options and precompute the values that we can. After that, document + characterize those changes.

This will end up being more computationally expensive to use the interpolated primitives which is not what @mangoschorle mentioned in his work -- so I'd really like his views on what is currently in the PR is there's anything he can enlighten us about that @stevedanomodolor and I have both not been able to acquire when you're back from vacation. I see similarish performance when I do 72 without interpolation (so virtually using 24 due to skipping every 3) and when I use 24 bins outright, but that's to be expected because they're the exact same. 72 bins interpolated to reach all 72 bins ends up being about ~2.5x as long

However, these new options do give you the ability to trade compute time for better path quality or slightly different behavior which I think is really nice and more "smooth" than what the Smac Planner Hybrid-A* produces today.

The costmap downsampling / quadratic options were things I explored during this work but not necessarily 100% related to the primitive interpolation of bins. They can be used independently (while trying to debug why the changes this PR was aiming to address were more expensive and experiment with things we could change)

mangoschorle commented 1 year ago

Interpolate Primitives I think that what you did in the PR is fine and a really good contribution in terms of quality. Having 72 bins but always skipping three at a time is just not so very sensible. Its better to just use 24 bins.

Now here is the thing regarding the interpolated primitives:

  1. Let us assume we have 72 bins but always skip three at a time.
  2. Let us assume we have a situation where the hallways are exactly along the principal axis. This means that to drive right you need to be in bin 0 to drive up you need to be in bin 18 and so on.
  3. Let us assume our initial angle is in bin 1. and our goal is only shifted in x to the east of our starting position.

The 2D heuristic will pull us straight to the goal (eastwards). To drive eastwards directly we would have to end up in bin 0. Since we started in bin 1 and only hop three bins at a time we can never end up in bin 0. Plowing straight to the right rapidly expanding along the heuristic's guidance is not possible.

We will end up expanding 1 -> 1 -> -2 -> 1 -> 1 -> -2 -> 1. This will cause much more A Star expansions. The 2D heuristic isn't a perfect guidance anymore. This is exactly what we see in several of the images that i posted before.

If we add the interpolated bins we can directly jump into bin 0 and, from there one, rapidly expand towards the goal. This is where I found that having more primitives both increased path quality and lowered the compute time.

Min turning angle Regarding this topic I would like to address something.

  if (angle < bin_size) {
    increments = 1.0f;
  } else {
    // Search dimensions are clean multiples of quantization - this prevents
    // paths with loops in them
    increments = ceil(angle / bin_size);
  }
  angle = increments * bin_size; 

Bigger curve radii with resolutions of 10cm and not an excessive amount of angle bins will quickly lead to angle < bin_size. The round-up to 1 essentially means that we cannot drive at the desired turning radius. The actual turning radius will be smaller. I am sure there are quite a lot of users out there unknowingly driving at smaller radii than they thin they are. So if you are already changing those files, could you add a warning here?

option for non-downsampled obstacle heuristic + quadratic form cost handling

I cannot really say anything about these topics until I see the code. I am more than happy to test it. The non-downsampled obstacle heuristic is something that sounds like a good plan. An optional downsampling for speedub is also good!

Now regarding the quadratic cost, as I said, I need to see what you are doing. My initial gut feeling was that I was surprised. I imagine a obstacle free world and I imagine the cost to be only distance. Whether A* optimizes for the sum of squares of distances between nodes or the sum of distances should make no difference. This makes me think that if it makes a difference when we have cost factors for turning ect. then we are essentially just mimicking a different cost factor setting with this approach.

Then I thought maybe you are using this squared sum in the 2D heuristic by summing the squared distances between cells and you are using the accumulated_distance squared in the A. This would be making the 2D heuristic inadmissible as the sum of squares is larger or equal to the square of the sum. Therefore you would have a variant of weighted A sacrificing quality of solution with rapidly exploring along the inadmissible heuristic.

Best Florian