ros-navigation / navigation2

ROS 2 Navigation Framework and System
https://nav2.org/
Other
2.51k stars 1.27k forks source link

Assess parallelism opportunities in Nav2 #2042

Closed SteveMacenski closed 2 years ago

SteveMacenski commented 3 years ago

From GPU, OpenMP, TBB, etc

simutisernestas commented 3 years ago

I've taken a look into costmap layers updates out of curiosity. I'm not quite sure if that's a reasonable assessment, so here is a short description of what I did exactly. As I understand the heaviest stuff in layers is done here ObstacleLayer::updateBounds and my idea were to exploit OpenMPs "parallel for" directive.

So I've modified the first loop in the LayeredCostmap::updateMap function to support parallel updates. Also, I've stacked more layers onto costmap to make the difference more clear. Changes are available here: https://github.com/simutisernestas/navigation2/commit/f0630c672d2db0c38ad852ac09d26b9216ddf718.

I've observed that map average update time (tb3 simulation, Intel® Core™ i7-3770 CPU @ 3.40GHz×8):

If considering the 5Hz update rate 6ms difference adds up to a 30ms gain per second. I suppose that processing the real-world data could possibly increase layer update time and the effect seen here would be much more visible (for example longer lidar range or bigger pointcloud).

Would be cool to hear a second opinion. :)

SteveMacenski commented 3 years ago

I'm not sure that update bounds is the best place for this because all of the layers updating will be updating those max/min i/j pointers. You'd need to make sure you handled those resources carefully so that they don't get corrupted. I'd think the updateCosts would be a good target too (but similarly with the master_grid needing to be careful). I think OMP has some options like SHARED or something similar to deal with these cases. I'd try both updates.

I was also thinking within the obstacle / voxel layers parallelizing the marking / clearing operations since those are independent measurements and many of them (QVA sensor is thousands of iterations)

30ms isn't anything to snuff at. Alot of significant performance gains can be had by nickle and diming the system. 30ms here... 30 ms there... all the sudden you're 2 or 3x faster. 6ms on a 24ms process is still a 25% improvement, that's alot for such a little amount of work!

abylikhsanov commented 3 years ago

Regarding our previous chat on this issue: https://github.com/ros-planning/navigation2/issues/2190

You have mentioned that you would start from the "outer loop" first, can you please elaborate more on what you meant?

SteveMacenski commented 3 years ago

You mentioned and outer and inner loop to try to parallelize with DWB, just start with the outer one only and then benchmark + add a PR. I think you'll find one level will do most of the heavy lifting you require (e.g. DWB has these N critics over M trajectories structure that could be parallelized at 2 levels). I forget which is the outer most for loop in DWB, but I think its the trajectory generator (e.g. M) to generate the M trajectories of vx * vy samples, parallelize that first

SteveMacenski commented 3 years ago

@simutisernestas is there a reason we couldn't merge that openMP solution into costmap 2d?

simutisernestas commented 3 years ago

if you're up for it, I would be happy to make a PR

SteveMacenski commented 3 years ago

Sure, its a starting point! Only does updateBounds (not costs) but you've shown some compelling speed ups on just that itself!

SteveMacenski commented 3 years ago

@abylikhsanov any progress to share?

Parv-Maheshwari commented 3 years ago
* Anything in planning? Not sure because search based. Maybe a sampling planner could better utilize

hi @SteveMacenski . I have worked on a sampling based local planner in frenet frame for ROS1 on which I have used OpenMP which showed a five fold increase in the frequency while using just 8 threads.

So I wanted to know would it possible to include our local planner as a controller plugin for NAV2. We would obviously add or change funtionalities according to the requirement for NAV2.

I would also love to read your thoughts about this and what all should/can we do.

P.S. Me and my team are open to porting our planner to ROS2

SteveMacenski commented 3 years ago

Hi @Parv-Maheshwari, thanks for reaching out! I think that might be a good discussion to have in this ticket https://github.com/ros-planning/navigation2/issues/1710 instead. Can you continue the discussion there explaining specifically what the technique is you've implemented that you'd be interested in contributing (and potentially a link if already open sourced)?

SteveMacenski commented 2 years ago

Closing for now -- I've recently done some experiments on a Nvidia Jetson and was surprised how little CPU Nav2 was using with the full system running while processing 2 depth sensors. It looks like Nav2 is good enough as-is for embedded use that we don't need to speed up a whole lot more to be perfectly suitable. DWB is the big area that can use the most help that is the thing causing problems and we have another ticket open to handle that https://github.com/ros-planning/navigation2/issues/2045