Logic in ros2_control_node for timing

tylerjw commented 2 years ago

Here is my current understanding of how the logic in ros2_control_node works:

ros2_control_loop drawio

There are several issues I see with this current design:

The period value that is sent to read/update/write is always 0 because the current_time and previous_time values are always the same.
The time value sent to read/update/write is the time between the call to read but before update.
The period variable is shadowed inside the loop which makes it hard to understand at first where the value is coming from.
This code is untestable because it is in a main function.

While the current design does not cause drift, the logic around these time and duration values is really confusing and hard to understand.

Proposed changes

I propose to refactor the logic of this loop into a higher-order function (a function that takes functions as arguments) that can be tested. I will work out a diagram of how this new loop works and include unit tests of it.

Here is my proposed change to the loop structure:

ros2_control_loop_update drawio (1)

Here is some pseudocode of how it will work:

template<typename NowF, typename OkF, typename WorkF, typename SleepF>
void ControlLoop(nanoseconds period, NowF now, OkF ok, WorkF do_work, SleepF sleep_until)
{
  auto current_time = now();
  auto measured_period = Duration{0};
  auto next_time = current_time;

  while(ok()) {
    auto const last_time = current_time;
    current_time = now();
    measured_period = current_time - last_time;
    next_time = next_time + period;

    do_work(current_time, measured_period);
    sleep_until(next_time);
  }
}

gavanderhoorn commented 2 years ago

As someone who has written many of these kinds of loops: whether to execute read(..) before write(..) (with some update(..) in the middle) or the other way around is largely a choice.

Your second diagram orders operations such that the exact point in time where write(..) is called will depend on how long update(..) takes. That's not ideal. Systems which do not have lower-level (sub)systems with fixed sampling periods (which your diagram sort-of suggests you have) could see large variance in when they get their control inputs.

I've seen write(..); update(..); read(..); sleep(..) and write(..); read(..); update(..); sleep(..) to mitigate this.

There is no single correct order.

tylerjw commented 2 years ago

I'm happy to re-order them in my PR. You will note though that the current design also has the issue you point out along with the other issues I pointed out. In the current design, write is always called after sleep/read/update.

gavanderhoorn commented 2 years ago

You will note though that the current design also has the issue you point out along with the other issues I pointed it.

I wasn't claiming it doesn't.

I merely wanted to make sure we agree here that while it seems write(..) should follow read(..) and update(..), that's not necessarily the case, and depends on the structure and configuration of the rest of the (underlying) system.

tylerjw commented 2 years ago

I pushed a change to make it write, update, read, sleep. I hadn't thought of what you pointed out but you are right that the hardware I'm interacting with is sensitive to write being called at a steady rate.

gavanderhoorn commented 2 years ago

Now we wait until someone comes along with a good argument for why it shouldn't be the order you just made the default ..

(such as when using the external motion interface of an industrial robot, where you have to make sure you're sending new references before the expiration of the current period and you're using the external interface to pace your own control loop. If you wait with write(..) till the "next period", you're too late)

tylerjw commented 2 years ago

The only reason I can think of is if there are some assumptions about the order of read and update. There might be a controller that depends on the fact that the values it is reading from the robot are the most recent ones (and not the ones from the last iteration of the loop).

tylerjw commented 2 years ago

My primary concern here was to fix what I saw as broken logic in how the loop was written with respect to the time and duration values being passed to the controllers. I'm happy to also fix other issues here while I'm at it though.

When I asked for feedback on my issue I posted here one thing I heard internally here at PickNik my generic design for ControlLoop feels too complex when I could just fix the logic in the main function. My counter to that is fixing the logic in the main function alone makes it much much harder for me to test the change I want to make here in any sort of robust or minimal way.

tylerjw commented 2 years ago

Another thing to point out is I'm using templates for the function arguments instead of std::function. That is a deliberate choice with the only pitfall being potentially harder-to-read compile errors if this function was called with arguments that couldn't be made to compile.

The upside is that having the function arguments be templates is much lighter weight than std::function. Ideally, I'd have C++20 concepts and I could constrain these template arguments in a way that would produce nice error messages.

gavanderhoorn commented 2 years ago

My primary concern here was to fix what I saw as broken logic in how the loop was written with respect to the time and duration values being passed to the controllers.

I'm not a maintainer nor developer here, but then my suggestion would be to make this PR about those fixes.

Don't change anything else.

When I asked for feedback on my issue I posted here one thing I heard internally here at PickNik my generic design for ControlLoop feels too complex when I could just fix the logic in the main function. My counter to that is fixing the logic in the main function alone makes it much much harder for me to test the change I want to make here in any sort of robust or minimal way.

ask 100 developers their opinion on some implementation and you'll get a 1000.

Whether something is complex depends on level of experience and skill.

Personally I don't believe it's too complex, but it's undeniable it's more complex than what used to be there.

However, limitations of the implementation language and the additional requirements you've stated (dependency injection design to improve testability) will require some 'tricks', which unfortunately would lead to more perceived complexity.

kshfire commented 2 years ago

I'm not sure but I think it is clear that there are problems in the cm_thread codes in ros2_control_node.cpp.

while (rclcpp::ok())
{
  // wait until we hit the end of the period
  end_period += period;
  std::this_thread::sleep_for(std::chrono::nanoseconds((end_period - cm->now()).nanoseconds()));

  // execute update loop
  auto period = current_time - previous_time;  // (Problem here :  current_time and previous_time are same => period is always zero !!!)
  cm->read(current_time, period);
  current_time = cm->now();
  cm->update(current_time, period);
  previous_time = current_time;
  cm->write(current_time, period);
}

The problems are : current_time and previous_time are same so, period is always zero....;;

when using velocity controller of the example 2 (Robots with multiple interfaces) is not working because the period argument of the read in the rrbot_ssytem_multi_interface.cpp is alway zero.

I'm not sure whether it is a proper solution or not, but I just modified it as follows and it looks ok maybe.

  while (rclcpp::ok())
  {
    // wait until we hit the end of the period
    end_period += period;  // cur_time + interval
    std::this_thread::sleep_for(
      std::chrono::nanoseconds((end_period - cm->now()).nanoseconds()));

    // execute update loop
    current_time = cm->now();  // read cur. time (Modify here : current_time must be fixed to avoid zero period value !!!!!)

    auto period = current_time - previous_time;

    cm->read(current_time, period);
    //current_time = cm->now();  // read cur. time (comment !!!)
    cm->update(current_time, period);
    //previous_time = current_time;  (comment !!!)
    cm->write(current_time, period);

    previous_time = current_time;  (modifiy here but it is not important!!!!)
  }

Result : Example 2: "Robots with multiple interfaces"
Since the update rate is 100 Hz, (which I added for debug) period_seconds : about 0.01 [s] seems to be ok.

[ros2_control_node-1] [INFO]: period_seconds: 0.01001, hw_slowdown: 50.00000! [ros2_control_node-1] [INFO]: Got the commands pos: 0.00000, vel: 5.00000, acc: 0.00000 for joint 0, control_lvl:2 [ros2_control_node-1] [INFO]: Got the commands pos: 0.00000, vel: 5.00000, acc: 0.00000 for joint 1, control_lvl:2 [ros2_control_node-1] [INFO]: Got pos: 23.46400, vel: 5.00000, acc: 0.00000 for joint 0! [ros2_control_node-1] [INFO]: period_seconds: 0.00999, hw_slowdown: 50.00000! [ros2_control_node-1] [INFO]: Got pos: 23.46400, vel: 5.00000, acc: 0.00000 for joint 1! [ros2_control_node-1] [INFO]: period_seconds: 0.00999, hw_slowdown: 50.00000! [ros2_control_node-1] [INFO]: Got the commands pos: 0.00000, vel: 5.00000, acc: 0.00000 for joint 0, control_lvl:2 [ros2_control_node-1] [INFO]: Got the commands pos: 0.00000, vel: 5.00000, acc: 0.00000 for joint 1, control_lvl:2 [ros2_control_node-1] [INFO]: Got pos: 23.46500, vel: 5.00000, acc: 0.00000 for joint 0!

Studying ROS 2 is not easy to me. Anyway, the while loop of cm_thead in ros2_control_node.cpp looks like it needs to be fixed. I hope it helps to improve the Ros 2 control (Normal & stable verion corresponding to the Humble is helpful for ROS beginner :) )

tylerjw commented 2 years ago

@kshfire can you try my pr here to see if it fixes those issues you observed? https://github.com/ros-controls/ros2_control/pull/728

ros-controls / ros2_control

Logic in ros2_control_node for timing #726

Proposed changes