ros2 / rclcpp

rclcpp (ROS Client Library for C++)
Apache License 2.0
536 stars 417 forks source link

relax the test simulation rate for timer canceling tests. #2453

Closed fujitatomoya closed 6 months ago

fujitatomoya commented 6 months ago

address https://github.com/ros2/rclcpp/issues/2452

fujitatomoya commented 6 months ago

CI:

Crola1702 commented 6 months ago

Just running a repeated build to see if it reduces the flakiness: Build Status

Crola1702 commented 6 months ago

Yeah, it seems right! Thanks for pushing this!

Crola1702 commented 6 months ago

Hold on. Just noticed that the CI builds are using sequential-executor, these kind of tests are mostly failing with parallel executor, so the issue is not completely fixed:

fujitatomoya commented 6 months ago

@Crola1702 really appreciate for checking the CI.

@clalancette @alsora

i believe i misunderstood the current test behavior.

i was expecting that

https://github.com/ros2/rclcpp/blob/1c350d0d7fb9c7158e0a39057112486ddbd38e9a/rclcpp/test/rclcpp/executors/test_executors_timer_cancel_behavior.cpp#L332

uses simulation clock to mitigate the racy system clock condition especially for windows to avoid the problem such as https://github.com/ros2/rclcpp/issues/2452

but it is not implemented as designed, since it uses system clock for the entire test. (that said nobody uses simulation clock.)

https://github.com/ros2/rclcpp/blob/1c350d0d7fb9c7158e0a39057112486ddbd38e9a/rclcpp/test/rclcpp/executors/test_executors_timer_cancel_behavior.cpp#L39-L48

https://github.com/ros2/rclcpp/blob/1c350d0d7fb9c7158e0a39057112486ddbd38e9a/rclcpp/test/rclcpp/executors/test_executors_timer_cancel_behavior.cpp#L202-L203

so actually what i have done here is making the situation worse, to give more time window for rasy timer condition?

i guess we already have made the mistake in https://github.com/ros2/rclcpp/pull/2375, precisely this specific commit https://github.com/ros2/rclcpp/pull/2375/commits/666e9d5cce3dfd7ddee4f7bc0ffd02b30b9ebfbb?

i would want to do the following after the confirmation of above my understanding.

fujitatomoya commented 6 months ago

@alsora @mauropasse

change use_sim_time into true.

does TimerManager support simulation time?

once we enable sim time true,

https://github.com/ros2/rclcpp/blob/1c350d0d7fb9c7158e0a39057112486ddbd38e9a/rclcpp/test/rclcpp/executors/test_executors_timer_cancel_behavior.cpp#L202-L203

only EventExecutor is failing,

    [----------] 6 tests from TestTimerCancelBehavior/EventsExecutor, where TypeParam = rclcpp::experimental::executors::EventsExecutor
    [ RUN      ] TestTimerCancelBehavior/EventsExecutor.testTimer1CancelledWithExecutorSpin
    /root/ros2_ws/colcon_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors_timer_cancel_behavior.cpp:264: Failure
    Expected: (t1_runs) != (t2_runs), actual: 1 vs 1

    /root/ros2_ws/colcon_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors_timer_cancel_behavior.cpp:266: Failure
    Expected: (t1_runs + 50) < (t2_runs), actual: 51 vs 1

    [  FAILED  ] TestTimerCancelBehavior/EventsExecutor.testTimer1CancelledWithExecutorSpin, where TypeParam = rclcpp::experimental::executors::EventsExecutor (870 ms)
    [ RUN      ] TestTimerCancelBehavior/EventsExecutor.testTimer2CancelledWithExecutorSpin
    /root/ros2_ws/colcon_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors_timer_cancel_behavior.cpp:282: Failure
    Expected: (t1_runs) != (t2_runs), actual: 1 vs 1

    /root/ros2_ws/colcon_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors_timer_cancel_behavior.cpp:284: Failure
    Expected: (t2_runs + 50) < (t1_runs), actual: 51 vs 1

    [  FAILED  ] TestTimerCancelBehavior/EventsExecutor.testTimer2CancelledWithExecutorSpin, where TypeParam = rclcpp::experimental::executors::EventsExecutor (852 ms)
    [ RUN      ] TestTimerCancelBehavior/EventsExecutor.testHeadTimerCancelThenResetBehavior
    /root/ros2_ws/colcon_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors_timer_cancel_behavior.cpp:311: Failure
    Expected: (t1_runs_initial + 50) < (t2_runs_initial), actual: 51 vs 1
...    
alsora commented 6 months ago

@fujitatomoya good catch; the timers manager in rclcpp doesn't support ROS sim_time as it uses the std library wait primitives.

timers_cv_.wait_for(lock, time_to_sleep.value(), [this]() {return timers_updated_;});

Our internal implementation of ROS overrides the stdlibrary with the ROS sim time, so we didn't catch the problem.

I would suggest for now to disable the events executor from this test while we look for a solution.

P.S. does ROS provide a "sim-time aware" condition variable?

fujitatomoya commented 6 months ago

P.S. does ROS provide a "sim-time aware" condition variable?

as far as i know, i do not think we have that... @clalancette @mjcarroll @wjwwood

i will summarize

@Crola1702 again, thanks for catching this, i will get back to you soon.