Closed Blast545 closed 3 weeks ago
Today it appeared without an exception, just the test failure, see: https://ci.ros2.org/view/nightly/job/nightly_linux_debug/2254/testReport/junit/launch_testing_ros.test.examples/wait_for_topic_launch_test/wait_for_topic_launch_test/
I'm able to reproduce both failures ( assert wait_for_node_object_2.wait()
and AssertionError: assert set() == {'chatter_0',..., 'chatter_4'}
, but only when running stress --cpu N
. With an idle system I can't reproduce at all.
Do you think this test failure might be related to this issue? https://ci.ros2.org/job/nightly_linux_coverage/1584/testReport/junit/launch_testing_ros.test.examples/wait_for_topic_launch_test/wait_for_topic_launch_test/ @sloretz
Today it appeared in the humble coverage job: https://ci.ros2.org/job/nightly_linux_humble_coverage/16/testReport/junit/launch_testing_ros.test.examples/wait_for_topic_launch_test/wait_for_topic_launch_test/
It also has been appearing often in the linux_coverage jobs: https://ci.ros2.org/job/nightly_linux_coverage/1604/testReport/junit/launch_testing_ros.test.examples/wait_for_topic_launch_test/wait_for_topic_launch_test/
:man_farmer: This error fails with a flaky rate of +- 7%.
References from last month: https://build.ros2.org/view/Rci/job/Rci__nightly-debug_ubuntu_jammy_amd64/121/ https://build.ros2.org/view/Rci/job/Rci__nightly-debug_ubuntu_jammy_amd64/136/ https://ci.ros2.org/view/nightly/job/nightly_linux_debug/2333/
Happened today in aarch64-debug: 2197
FAIL: wait_for_topic_launch_test.TestFixture.test_topics_unsuccessful
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jenkins-agent/workspace/nightly_linux-aarch64_debug/ws/src/ros2/launch_ros/launch_testing_ros/test/examples/wait_for_topic_launch_test.py", line 90, in test_topics_unsuccessful
assert wait_for_node_object.topics_received() == expected_topics
AssertionError: assert {'chatter_1',..., 'chatter_4'} == {'chatter_0',..., 'chatter_4'}
Extra items in the right set:
'chatter_0'
This issue increased is appearance on ros2 buildfarm during the last week considerably.
Job: Rci__nightly-debug_ubuntu_jammy_amd64 Last time seen: 2023-02-21 Last 30 builds: 3 builds failed, 10% with error Last 7 builds: 3 builds failed, 42.85% with error
Job: nightly_linux_coverage Last time seen: 2023-02-20 Last 30 builds: 7 builds failed, 23.33% with error Last 7 builds: 4 builds failed, 57.14% with error
Job: Rci__nightly-release_ubuntu_jammy_amd64 Last time seen: 2023-02-20 Last 30 builds: 2 builds failed, 6.66% with error Last 7 builds: 2 builds failed, 28.57% with error
Job: nightly_linux-rhel_debug Last time seen: 2023-02-19 Last 30 builds: 3 builds failed, 10% with error Last 7 builds: 1 builds failed, 14.28% with error
Job: nightly_linux_debug Last time seen: 2023-02-18 Last 30 builds: 1 builds failed, 3.33% with error Last 7 builds: 1 builds failed, 14.28% with error
Job: nightly_linux-rhel_repeated Last time seen: 2023-02-17 23:00:01.423000 Last 30 builds: 2 builds failed, 6.66% with error Last 7 builds: 1 builds failed, 14.28% with error
Job: nightly_linux_humble_coverage Last time seen: 2023-02-17 Last 30 builds: 4 builds failed, 13.33% with error Last 7 builds: 2 builds failed, 28.57% with error
Job: nightly_linux_repeated Last time seen: 2023-02-17 Last 30 builds: 1 builds failed, 3.33% with error Last 7 builds: 1 builds failed, 14.28% with error
Job: nightly_linux-aarch64_debug Last time seen: 2023-02-10 Last 30 builds: 1 builds failed, 3.33% with error Last 7 builds: 0 builds failed, 0% with error
This issue increased is appearance on ros2 buildfarm during the last week considerably.
I think the reason for that is that there were 2 weeks in there where we basically weren't running any of the rclpy-based tests (this was due to a bug in colcon that has since been fixed). So it may not be something that actually increased, we just really weren't testing it much over the last month.
Can you go back and compare with historical data? Like, what was the percentage failure in November and December 2022?
what was the percentage failure in November and December 2022?
Rcinightly-debug_ubuntu_jammy_amd64: 6.56% nightly_linux_coverage: 37.1% nightly_linux-rhel_debug: 12.9% nightly_linux_debug: 11.48% nightly_linux-rhel_repeated: 11.29% nightly_linux_humble_coverage: 12.9% nightly_linux_repeated: 4.92% nightly_linux-aarch64_debug: 9.09% nightly_linux-aarch64_repeated: 1.37% Rcinightly-fastrtps_ubuntu_jammy_amd64: 1.64%
In both November and December
Yes, I think they are like the same as before
job_name | last_time | first_time | build_count | failure_count | failure_percentage |
---|---|---|---|---|---|
nightly_linux-rhel_repeated | 2023-03-26 | 2021-12-09 | 30 | 14 | 46.67 |
nightly_linux_coverage | 2023-03-26 | 2021-10-12 | 30 | 14 | 46.67 |
Rci__nightly-debug_ubuntu_jammy_amd64 | 2023-03-27 | 2022-03-01 | 31 | 9 | 29.03 |
Rci__nightly-fastrtps_ubuntu_jammy_amd64 | 2023-03-27 | 2022-03-01 | 33 | 9 | 27.27 |
nightly_linux_debug | 2023-03-27 | 2021-10-31 | 31 | 7 | 22.58 |
nightly_linux-aarch64_debug | 2023-03-26 | 2021-07-23 | 34 | 6 | 17.65 |
nightly_linux-rhel_debug | 2023-03-26 | 2021-12-10 | 30 | 5 | 16.67 |
nightly_linux-aarch64_repeated | 2023-03-26 | 2021-07-23 | 32 | 3 | 9.38 |
nightly_linux-rhel_release | 2023-03-26 | 2021-12-30 | 30 | 2 | 6.67 |
Rci__nightly-release_ubuntu_jammy_amd64 | 2023-03-27 | 2022-03-01 | 33 | 2 | 6.06 |
nightly_linux_humble_coverage | 2023-03-27 | 2022-08-08 | 31 | 1 | 3.23 |
I think these percentages are higher than ever. Some jobs almost reached 50% flakiness.
@clalancette Do you think we should disable this test?
@clalancette Do you think we should disable this test?
No, I don't think so. The ones that we actually mostly care about (nightly_linux_debug
and below) are at 22%. Also, we will hopefully have some time to look at this later in April.
With #360 merged, I'm going to call this "fixed". But please reopen if this starts occuring again.
A similar problem is happening in Humble Coverage at a 10% flaky ratio:
Reference build: https://ci.ros2.org/job/nightly_linux_humble_coverage/594/
wait_for_topic_launch_test.TestFixture.test_topics_successful failed
FAIL: wait_for_topic_launch_test.TestFixture.test_topics_successful
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jenkins-agent/workspace/nightly_linux_humble_coverage/ws/src/ros2/launch_ros/launch_testing_ros/test/examples/wait_for_topic_launch_test.py", line 62, in test_topics_successful
with WaitForTopics(topic_list, timeout=2.0) as wait_for_node_object_1:
File "/home/jenkins-agent/workspace/nightly_linux_humble_coverage/ws/src/ros2/launch_ros/launch_testing_ros/launch_testing_ros/wait_for_topics.py", line 96, in __enter__
raise RuntimeError('Did not receive messages on these topics: ',
RuntimeError: ('Did not receive messages on these topics: ', {'chatter_1'})
I'm not sure if this is the same problem, or if it is a different but similar one
This seems to be happening consistently in humble coverage jobs now, see: https://ci.ros2.org/job/nightly_linux_humble_coverage/944/
This is happening as a flaky issue
Flakiness for the last 15 days: job_name | last_fail | first_fail | build_count | failure_count | failure_percentage |
---|---|---|---|---|---|
nightly_linux_humble_coverage | 2024-10-07 | 2024-09-23 | 14 | 7 | 50.0 |
Ici__nightly-debug_ubuntu_jammy_amd64 | 2024-10-04 | 2024-10-04 | 5 | 1 | 20.0 |
nightly_linux_iron_coverage | 2024-10-02 | 2024-09-22 | 13 | 3 | 23.08 |
nightly_linux-rhel_repeated | 2024-09-24 | 2024-09-23 | 15 | 2 | 13.33 |
Last failures:
This issue didn't happen again
Bug report
Required Info:
jammy
rclpy
Steps to reproduce issue
Run a buildfarm job on
nightly_linux_debug
Expected behavior
All tests pass.
Actual behavior
Various
launch_testing_ros
tests fail and throw exceptions. See: https://ci.ros2.org/view/nightly/job/nightly_linux_debug/2251/Additional information
wait_for_topic_launch_test.TestFixture.test_topics_unsuccessful
shows:Error log
``` Traceback (most recent call last):^M File "/home/jenkins-agent/workspace/nightly_linux_debug/ws/src/ros2/launch_ros/launch_testing_ros/test/examples/wait_for_topic_launch_test.py", line 89, in test_topics_unsuccessful^M assert wait_for_node_object.topics_received() == expected_topics^M AssertionError: assert {'chatter_0',..., 'chatter_4'} == {'chatter_0',..., 'chatter_4'}^M Extra items in the right set:^M 'chatter_2'^M 'chatter_1'^M Use -v to get more diff ... [INFO] [python-7]: sending signal 'SIGINT' to process[python-7]^M [INFO] [python-6]: sending signal 'SIGINT' to process[python-6]^M [python-6] Traceback (most recent call last):^M [python-6] File "/home/jenkins-agent/workspace/nightly_linux_debug/ws/src/ros2/launch_ros/launch_testing_ros/test/examples/talker.py", line 51, inwait_for_topic_launch_test.TestFixture.test_topics_successful
shows:Error log
``` Traceback (most recent call last):^M File "/home/jenkins-agent/workspace/nightly_linux_debug/ws/src/ros2/launch_ros/launch_testing_ros/test/examples/wait_for_topic_launch_test.py", line 61, in test_topics_successful^M with WaitForTopics(topic_list, timeout=2.0) as wait_for_node_object_1:^M File "/home/jenkins-agent/workspace/nightly_linux_debug/ws/src/ros2/launch_ros/launch_testing_ros/launch_testing_ros/wait_for_topics.py", line 96, in __enter__^M raise RuntimeError('Did not receive messages on these topics: ',^M RuntimeError: ('Did not receive messages on these topics: ', {'chatter_4'}) ... [python-12] Traceback (most recent call last):^M [python-12] File "/home/jenkins-agent/workspace/nightly_linux_debug/ws/src/ros2/launch_ros/launch_testing_ros/test/examples/talker.py", line 51, in