RobertWilbrandt commented 1 week ago

Describe the bug When executing very long trajectories (order: >100.000 points), the time jtc spends in update() rises over the course of execution. This gradually decreases the time spent spent sleeping in the control node until the control loop can no longer keep up and the control frequency drops.

After being pointed to the Trajectory::sample() function by @fmauch i was able to confirm that this is the cause for the observed issue. In every control cycle, the trajectory is traversed linearly until the relevant trajectory points for sampling at the current time step are found.

To Reproduce Steps to reproduce the behavior:

Take any system running a JTC and send a very long trajectory to it.
Monitor the time spent in the jtc update() function

Expected behavior The time spent in update() should stay at a constant level from the start to the end of trajectory execution.

Screenshots

The problem became evident when working on a new hardware interface and executing a trajectory of ~220.000 points over ~900 seconds. While the execution starts correctly, after some time the update rate decreases to a level at which not every control cycle for the communication thread can get a command from the hardware interface, leading to excessive vibrations.

Using lttng, i traced the calls to the hardware interface write() functions and could plot the following progression:

jtc_timing_before

The large spike seen at ~120s is the start of the trajectory. At ~290s all headroom in the control loop is exhausted and the control frequency starts dropping. At ~375s i had to stop the robot as i did not want to damage any joints.

Environment (please complete the following information):

OS: Ubuntu 24.04 noble
Version: Current master branch, built from source

Additional context I will post a PR that uses binary search for finding the currently relevant trajectory points and could solve this problem for me. As other options could be lower overhead than that (e.g. keeping track of the current index in the trajectory during execution), i went ahead and created this issue first.

I am aware that this is probably not a common use case for the JTC, but this could also pose problems on more ressource constrained systems or when running multiple different controllers side-by-side.

gavanderhoorn commented 1 week ago

474?

RobertWilbrandt commented 1 week ago

Thanks, i only looked at issues before and didn't see that pr.

As discussed in the WG meeting today, it makes more sense to go with a solution like that as there is no need to ever access a trajectory in a not monotonically increasing manner.

jodle001 commented 1 week ago

I can confirm this issue is happening for a trajectory that I am running. CPU usage climbs as a very long trajectory is executing.

ros-controls / ros2_controllers

[jtc] Required time for controller update increases over the course of trajectory #1293

474?