ros2 / rclcpp

rclcpp (ROS Client Library for C++)
Apache License 2.0
536 stars 417 forks source link

max_duration not respected in spin_some for StaticSingleThreadedExecutor and EventsExecutor #2462

Open wjwwood opened 6 months ago

wjwwood commented 6 months ago

Bug report

Required Info:

Steps to reproduce issue

Remove the logic to skip the tests for spin_some_max_duration for the StaticSingleThreadedExecutor and EventsExecutor:

https://github.com/ros2/rclcpp/blob/a85382aa36d6c9bb9b3cfebfc096f1a368b47473/rclcpp/test/rclcpp/executors/test_executors.cpp#L574-L582

Run the test and see that they fail.

Expected behavior

The tests should pass as they are testing expected and documented behavior for spin_some.

Actual behavior

Tests fail.

Additional information

See also: https://github.com/ros2/rclcpp/pull/2460

wjwwood commented 6 months ago

@alsora FYI we found this while working on some other things. I didn't have much time to investigate it, but it's possible the StaticSingleThreadedExecutor will be fixed with the changes in https://github.com/ros2/rclcpp/pull/2142, but I haven't had time to retry that. But that pr doesn't change the EventsExecutor, and they fail in different ways anyway:

[----------] 1 test from TestExecutors/StaticSingleThreadedExecutor, where TypeParam = rclcpp::executors::StaticSingleThreadedExecutor
[ RUN      ] TestExecutors/StaticSingleThreadedExecutor.spin_some_max_duration
/home/william/ros2_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors.cpp:616: Failure
Expected: (to_nanoseconds_helper(spin_some_run_time)) < (to_nanoseconds_helper(waitable_callback_duration * 2)), actual: 4004709489 vs 4000000000
spin_some() took longer than expected to execute by a significant margin, but this could be a false positive on a very slow computer
/home/william/ros2_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors.cpp:624: Failure
Expected equality of these values:
  number_of_waitables_executed
    Which is: 2
  1u
    Which is: 1
expected exactly one of the two waitables to be executed, but my_waitable1->get_count(): 1 and my_waitable2->get_count(): 1
[  FAILED  ] TestExecutors/StaticSingleThreadedExecutor.spin_some_max_duration, where TypeParam = rclcpp::executors::StaticSingleThreadedExecutor (4042 ms)
[----------] 1 test from TestExecutors/StaticSingleThreadedExecutor (4042 ms total)

[----------] 1 test from TestExecutors/EventsExecutor, where TypeParam = rclcpp::experimental::executors::EventsExecutor
[ RUN      ] TestExecutors/EventsExecutor.spin_some_max_duration
/home/william/ros2_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors.cpp:610: Failure
Expected: (to_nanoseconds_helper(spin_some_run_time)) > (to_nanoseconds_helper(waitable_callback_duration / 2)), actual: 55376 vs 1000000000
spin_some() took less than half the expected time to execute a single waitable, which implies it did not actually execute one when it was expected to
/home/william/ros2_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors.cpp:624: Failure
Expected equality of these values:
  number_of_waitables_executed
    Which is: 0
  1u
    Which is: 1
expected exactly one of the two waitables to be executed, but my_waitable1->get_count(): 0 and my_waitable2->get_count(): 0
[  FAILED  ] TestExecutors/EventsExecutor.spin_some_max_duration, where TypeParam = rclcpp::experimental::executors::EventsExecutor (22 ms)
[----------] 1 test from TestExecutors/EventsExecutor (22 ms total)
alsora commented 6 months ago

@wjwwood I opened a PR to fix the unit-test for the events executor: https://github.com/ros2/rclcpp/pull/2465

However, I'm confused by the expected behavior here. The test was not triggering the waitables (this was the reason why the events executor log above showed 0 executed entities), so why were the other executors "executing" them? This to me seems a bug in all the waitset-based executors implementations, and the events executor was actually doing the correct behavior.

alsora commented 6 months ago

Ok, I think that there was also a problem in the is_ready method of the TestWaitable

  bool
  is_ready(rcl_wait_set_t * wait_set) override
  {
    (void)wait_set;
    return true;
  }

This shouldn't blindly return "ready", but it should check whether the guard condition was triggered (i.e. if it was part of the waitset). The events executor doesn't rely on this method, that's why it has a different behavior.

So the method should probably become something like this

bool
is_ready(rcl_wait_set_t * wait_set)
{
  for (size_t i = 0; i < wait_set->size_of_guard_conditions; ++i) {
    auto rcl_guard_condition = wait_set->guard_conditions[i];
    if (&_gc->get_rcl_guard_condition() == rcl_guard_condition) {
      return true;
    }

  return false;
}
alsora commented 6 months ago

Reopening this, since the test is still disabled for the StaticSingleThreadedExecutor, although it could be fixed by https://github.com/ros2/rclcpp/pull/2142