Flaky joint_trajectory_controller test

bmagyar commented 3 years ago

It causes sporadic failures on the CI

Function 'run_target_test' returned with code '0' after 0 min 27 sec
build/joint_trajectory_controller/Testing/20201223-1353/Test.xml: 10 tests, 0 errors, 1 failure, 0 skipped
- test_trajectory_actions
  <<< failure message
    -- run_test.py: invoking following command in '/root/target_ws/build/joint_trajectory_controller':
     - /root/target_ws/build/joint_trajectory_controller/test_trajectory_actions --gtest_output=xml:/root/target_ws/build/joint_trajectory_controller/test_results/joint_trajectory_controller/test_trajectory_actions.gtest.xml
    Running main() from /opt/ros/foxy/src/gtest_vendor/src/gtest_main.cc
    [==========] Running 6 tests from 1 test case.
    [----------] Global test environment set-up.
    [----------] 6 tests from TestTrajectoryActions
    [ RUN      ] TestTrajectoryActions.test_success_single_point_sendgoal
    [INFO] [1608731594.223464761] [test_joint_trajectory_controller]: Controller state will be published at 50Hz.
    [INFO] [1608731594.224467156] [test_joint_trajectory_controller]: Action status changes will be monitored at 20Hz.
    [INFO] [1608731594.528069895] [test_joint_trajectory_controller]: Received new action goal
    [INFO] [1608731594.528172294] [test_joint_trajectory_controller]: Accepted new action goal
    [INFO] [1608731595.028759186] [test_joint_trajectory_controller]: Goal reached, success!
    [       OK ] TestTrajectoryActions.test_success_single_point_sendgoal (2025 ms)
    [ RUN      ] TestTrajectoryActions.test_success_multi_point_sendgoal
    [INFO] [1608731596.241568953] [test_joint_trajectory_controller]: Controller state will be published at 50Hz.
    [INFO] [1608731596.241845852] [test_joint_trajectory_controller]: Action status changes will be monitored at 20Hz.
    [INFO] [1608731596.544673796] [test_joint_trajectory_controller]: Received new action goal
    [INFO] [1608731596.545481792] [test_joint_trajectory_controller]: Accepted new action goal
    [INFO] [1608731596.846115746] [test_joint_trajectory_controller]: Goal reached, success!
    [       OK ] TestTrajectoryActions.test_success_multi_point_sendgoal (2015 ms)
    [ RUN      ] TestTrajectoryActions.test_goal_tolerances_single_point_success
    [INFO] [1608731598.257190764] [test_joint_trajectory_controller]: Controller state will be published at 50Hz.
    [INFO] [1608731598.257477463] [test_joint_trajectory_controller]: Action status changes will be monitored at 20Hz.
    [INFO] [1608731598.559851910] [test_joint_trajectory_controller]: Received new action goal
    [INFO] [1608731598.559918010] [test_joint_trajectory_controller]: Accepted new action goal
    [INFO] [1608731599.060530204] [test_joint_trajectory_controller]: Goal reached, success!
    [       OK ] TestTrajectoryActions.test_goal_tolerances_single_point_success (2015 ms)
    [ RUN      ] TestTrajectoryActions.test_goal_tolerances_multi_point_success
    [INFO] [1608731600.271895386] [test_joint_trajectory_controller]: Controller state will be published at 50Hz.
    [INFO] [1608731600.272183685] [test_joint_trajectory_controller]: Action status changes will be monitored at 20Hz.
    [INFO] [1608731600.574736832] [test_joint_trajectory_controller]: Received new action goal
    [INFO] [1608731600.574794632] [test_joint_trajectory_controller]: Accepted new action goal
    [INFO] [1608731600.874930391] [test_joint_trajectory_controller]: Goal reached, success!
    [       OK ] TestTrajectoryActions.test_goal_tolerances_multi_point_success (2015 ms)
    [ RUN      ] TestTrajectoryActions.test_state_tolerances_fail
    [INFO] [1608731602.287111013] [test_joint_trajectory_controller]: Controller state will be published at 50Hz.
    [INFO] [1608731602.287414111] [test_joint_trajectory_controller]: Action status changes will be monitored at 20Hz.
    [INFO] [1608731602.590335958] [test_joint_trajectory_controller]: Received new action goal
    [INFO] [1608731602.590960355] [test_joint_trajectory_controller]: Accepted new action goal
    [WARN] [1608731602.591742151] [test_joint_trajectory_controller]: Aborted due to state tolerance violation
    [       OK ] TestTrajectoryActions.test_state_tolerances_fail (2015 ms)
    [ RUN      ] TestTrajectoryActions.test_cancel_hold_position
    [INFO] [1608731604.301218551] [test_joint_trajectory_controller]: Controller state will be published at 50Hz.
    [INFO] [1608731604.301502750] [test_joint_trajectory_controller]: Action status changes will be monitored at 20Hz.
    [INFO] [1608731604.603876600] [test_joint_trajectory_controller]: Received new action goal
    [INFO] [1608731604.603946000] [test_joint_trajectory_controller]: Accepted new action goal
    [INFO] [1608731605.103925102] [test_joint_trajectory_controller]: Got request to cancel goal
    -- run_test.py: return code -11
    -- run_test.py: generate result file '/root/target_ws/build/joint_trajectory_controller/test_results/joint_trajectory_controller/test_trajectory_actions.gtest.xml' with failed test
    -- run_test.py: verify result file '/root/target_ws/build/joint_trajectory_controller/test_results/joint_trajectory_controller/test_trajectory_actions.gtest.xml'
  >>>
build/joint_trajectory_controller/test_results/joint_trajectory_controller/test_trajectory_actions.gtest.xml: 1 test, 1 error, 0 failures, 0 skipped
- joint_trajectory_controller test_trajectory_actions.gtest.missing_result
  <<< error message
    The test did not generate a result file:

    Running main() from /opt/ros/foxy/src/gtest_vendor/src/gtest_main.cc
    [==========] Running 6 tests from 1 test case.
    [----------] Global test environment set-up.
    [----------] 6 tests from TestTrajectoryActions
    [ RUN      ] TestTrajectoryActions.test_success_single_point_sendgoal
    [INFO] [1608731594.223464761] [test_joint_trajectory_controller]: Controller state will be published at 50Hz.
    [INFO] [1608731594.224467156] [test_joint_trajectory_controller]: Action status changes will be monitored at 20Hz.
    [INFO] [1608731594.528069895] [test_joint_trajectory_controller]: Received new action goal
    [INFO] [1608731594.528172294] [test_joint_trajectory_controller]: Accepted new action goal
    [INFO] [1608731595.028759186] [test_joint_trajectory_controller]: Goal reached, success!
    [       OK ] TestTrajectoryActions.test_success_single_point_sendgoal (2025 ms)
    [ RUN      ] TestTrajectoryActions.test_success_multi_point_sendgoal
    [INFO] [1608731596.241568953] [test_joint_trajectory_controller]: Controller state will be published at 50Hz.
    [INFO] [1608731596.241845852] [test_joint_trajectory_controller]: Action status changes will be monitored at 20Hz.
    [INFO] [1608731596.544673796] [test_joint_trajectory_controller]: Received new action goal
    [INFO] [1608731596.545481792] [test_joint_trajectory_controller]: Accepted new action goal
    [INFO] [1608731596.846115746] [test_joint_trajectory_controller]: Goal reached, success!
    [       OK ] TestTrajectoryActions.test_success_multi_point_sendgoal (2015 ms)
    [ RUN      ] TestTrajectoryActions.test_goal_tolerances_single_point_success
    [INFO] [1608731598.257190764] [test_joint_trajectory_controller]: Controller state will be published at 50Hz.
    [INFO] [1608731598.257477463] [test_joint_trajectory_controller]: Action status changes will be monitored at 20Hz.
    [INFO] [1608731598.559851910] [test_joint_trajectory_controller]: Received new action goal
    [INFO] [1608731598.559918010] [test_joint_trajectory_controller]: Accepted new action goal
    [INFO] [1608731599.060530204] [test_joint_trajectory_controller]: Goal reached, success!
    [       OK ] TestTrajectoryActions.test_goal_tolerances_single_point_success (2015 ms)
    [ RUN      ] TestTrajectoryActions.test_goal_tolerances_multi_point_success
    [INFO] [1608731600.271895386] [test_joint_trajectory_controller]: Controller state will be published at 50Hz.
    [INFO] [1608731600.272183685] [test_joint_trajectory_controller]: Action status changes will be monitored at 20Hz.
    [INFO] [1608731600.574736832] [test_joint_trajectory_controller]: Received new action goal
    [INFO] [1608731600.574794632] [test_joint_trajectory_controller]: Accepted new action goal
    [INFO] [1608731600.874930391] [test_joint_trajectory_controller]: Goal reached, success!
    [       OK ] TestTrajectoryActions.test_goal_tolerances_multi_point_success (2015 ms)
    [ RUN      ] TestTrajectoryActions.test_state_tolerances_fail
    [INFO] [1608731602.287111013] [test_joint_trajectory_controller]: Controller state will be published at 50Hz.
    [INFO] [1608731602.287414111] [test_joint_trajectory_controller]: Action status changes will be monitored at 20Hz.
    [INFO] [1608731602.590335958] [test_joint_trajectory_controller]: Received new action goal
    [INFO] [1608731602.590960355] [test_joint_trajectory_controller]: Accepted new action goal
    [WARN] [1608731602.591742151] [test_joint_trajectory_controller]: Aborted due to state tolerance violation
    [       OK ] TestTrajectoryActions.test_state_tolerances_fail (2015 ms)
    [ RUN      ] TestTrajectoryActions.test_cancel_hold_position
    [INFO] [1608731604.301218551] [test_joint_trajectory_controller]: Controller state will be published at 50Hz.
    [INFO] [1608731604.301502750] [test_joint_trajectory_controller]: Action status changes will be monitored at 20Hz.
    [INFO] [1608731604.603876600] [test_joint_trajectory_controller]: Received new action goal
    [INFO] [1608731604.603946000] [test_joint_trajectory_controller]: Accepted new action goal
    [INFO] [1608731605.103925102] [test_joint_trajectory_controller]: Got request to cancel goal
  >>>

Summary: 175 tests, 1 error, 1 failure, 0 skipped
Error: Process completed with exit code 1.

destogl commented 3 years ago

@matthew-reynolds could you look at this? You resolved a similar issue in ros_control repo

destogl commented 3 years ago

The tests are still flaky (see #154 builds). But I found out why! We have timing issues. Probably is the runner CPU on maximal load and therefore limits/thresholds are not hold each time. For now, I adjusted the threshold a bit, but this is not a long-term solution. I added comments about that in the test file: https://github.com/ros-controls/ros2_controllers/blob/b89f5054a643d63c608186ffa42dd02c0f1005c5/joint_trajectory_controller/test/test_trajectory_controller.cpp#L458 and header file: https://github.com/ros-controls/ros2_controllers/blob/b89f5054a643d63c608186ffa42dd02c0f1005c5/joint_trajectory_controller/test/test_trajectory_controller_utils.hpp#L31

A possible (right) solution would be to measure real wait_time in the updateController method . And based on this value the hard-coded values would be adjusted (increased) https://github.com/ros-controls/ros2_controllers/blob/b89f5054a643d63c608186ffa42dd02c0f1005c5/joint_trajectory_controller/test/test_trajectory_controller.cpp#L462-L464

This should be done in all places the same testing approach is used or this could happen.

Regarding the flakiness of the actions tests, we should use custom trajectory constraints. (Or maybe rerun tests multiple times?!?)

christophfroehlich commented 3 months ago

The code evolved in the last years and it seems that the mentioned errors are not happening any more.

ros-controls / ros2_controllers

Flaky joint_trajectory_controller test #132