Controller server aborts the action server handle

abylikhsanov commented 3 years ago

Bug report

Required Info:

Operating System:
- Ubuntu 20.04
ROS2 Version:
- Foxy binaries
Version or commit hash:
- From source latest navigation2 release (0.4.5)
DDS implementation:
- Fast-RTPS

Steps to reproduce issue

Launch gazebo simulation (turtlebot3 spawn) on a stationary PC and launch a sample navigation2 launch file on a less powerful ARM machine (for example RPi4). The idea is to run all heavy algorithms on ARM machine while sensor simulation comes from Gazebo (just a simple hardware in a loop).

As this issue comes from the controller node, use this FollowPath parameters (you can omit holonomic critic) :

FollowPath:
      GoalAlign.forward_point_distance: 0.1
      GoalAlign.scale: 24.0
      GoalDist.scale: 24.0
      ObstacleFootprintCritic.scale: 1.0
      PathAlign.forward_point_distance: 1.0
      PathAlign.scale: 32.0
      PathDist.scale: 32.0
      RotateToGoal.lookahead_time: 2.0
      RotateToGoal.scale: 32.0
      RotateToGoal.slowing_factor: 5.0
      Twirling.scale: 10.0
      Holonomic.scale: 40.0
      acc_lim_theta: 1.0
      acc_lim_x: 0.5
      acc_lim_y: 0.3
      angular_granularity: 0.025
      critics:
        - RotateToGoal
        - Oscillation
        - ObstacleFootprintCritic
        - GoalAlign
        - PathAlign
        - PathDist
        - GoalDist
        - Twirling
        - Holonomic
      debug_trajectory_details: false
      decel_lim_theta: -1.0
      decel_lim_x: -0.5
      decel_lim_y: -0.5
      linear_granularity: 0.05
      max_speed_xy: 0.7
      max_vel_theta: 1.0
      max_vel_x: 0.7
      max_vel_y: 0.2
      min_speed_theta: 0.0
      min_speed_xy: -0.7
      min_vel_x: -0.7
      min_vel_y: -0.2
      plugin: dwb_core::DWBLocalPlanner
      short_circuit_trajectory_evaluation: true
      sim_time: 2.0
      stateful: true
      trans_stopped_velocity: 0.25
      transform_tolerance: 0.2
      vtheta_samples: 40
      vx_samples: 20
      vy_samples: 20
      xy_goal_tolerance: 0.1

Expected behavior

I have tested this on my stationary PC first and it should run normally where robot can navigate and avoid obstacles. So nothing extraordinary

Actual behavior

Controller server constantly aborts the handle and the loop itself seems to be slow (on my local pc, I get regularly a message that the control loop missed its desired rate of 5Hz but the frequency of those messages are much higher than on a lower end ARM machine.

I first thought that DWB configuration I have is so heavy that my ARM machine simply cannot handle but interesingly, the CPU load is below average. The controller server only consumes 1 full core (103% usage out of 400).

Therefore, I am a bit confused. Does the controller have a limit of amount of cores it can use?

abylikhsanov commented 3 years ago

Actually it seems that after I remove the ObstacleFootprintCritic it works without any problems. Strange, can't figure out what could be so computationally expensive in that critic after reading that critic source code...

SteveMacenski commented 3 years ago

BaseObstacleCritic is much lighter weight, because it assumes your robot is circular so uses the cost at the center of the robot. The ObstacleFootprintCritic will be looking at your footprint polygon and interpolating between the points by costmap cell resolution to make sure its not in collision. For N vx samples and M vy and X samples per M*N, you can see how doing this would be expensive (N*M*X times)


      vtheta_samples: 40
      vx_samples: 20
      vy_samples: 20

Are you holonomic? 20 vy samples is crazy if you're not :laughing:.

We don't have anything that specifically limits the servers to a single core, though DWB is single threaded, so there's not much use it can make of multiple cores. Especially for DWB, https://github.com/ros-planning/navigation2/issues/2042, this ticket really covers the fact that we could very easily use multi-threading in DWB, but we don't at the moment. All those N*M*X could be given out to different threads in a pool pretty easily and should speed things up greatly.

Controller server constantly aborts the handle and the loop itself seems to be slow (on my local pc, I get regularly a message that the control loop missed its desired rate of 5Hz but the frequency of those messages are much higher than on a lower end ARM machine.

Weird / bad things happen when you saturate your cores, so it doesn't surprise me that DDS in the backend might abort a handle because it missed some messages as the CPU was completely saturated. The running slow thing we talked about above. The usual rule of thumb is that you shouldn't saturate your cores more than 80% on steady state so there's room for spikes.

But sounds like what you need is some mixture of:

reconfiguring to reduce load / samples / collision checking
possible work on adding multi-thread support to DWB (would love to see it!)
more compute to not saturate your cores

In any case, we can continue to talk about this, I'd like to work with you to find a reasonable solution, but it seems like there's not a bug here more than tickets already filed https://github.com/ros-planning/navigation2/issues/2042

I'd also generally like to know your experiences working with Nav2 on small ARM platforms. All of my robots have beefy i5's at minimum. So it would be good to know your experiences and sensors you can run on that small of a platform and what kind of resource utilization it has. Might make for a good blog post or Nav2 documentation post ;-)

abylikhsanov commented 3 years ago

Hi,

Yes, I am using a mecanum robot and probably you are reight, 20 vy samples is too much :)

I think in this case the best scenario would be to implement the multi-thread support as my other cores are literally not used (they are loaded by only 10-20%, plenty of room). The reason we use ARM is of course cost effectivness and energy consumption and we are actually using it in real industrial environment competing with machines that have desktop CPUs.

I need to dig more into the DWB local planner to see how it really works but in general I really do not like multithreading in C++ (we use Rust) as we need to be very very careful :)

In the issue #2042, you have mentioned:

DWB has these N critics over M trajectories structure that could be parallelized at 2 levels

So I will focus on that first, seems reasonable.

SteveMacenski commented 3 years ago

I would focus on the outer-most loop first and see what that does before going too crazy. You'll get the most benefits there since those threads will do more "work" so the overhead of spinning up a new thread is more. I wouldn't even try to go for multithreading both sets of loops at once, one at a time or you might find yourself in a weird crashing situation that's hard to recover your work from (from experience)

Should we close this ticket and migrate our discussions to #2042?

abylikhsanov commented 3 years ago

Yes, let's do this

ros-navigation / navigation2