ros2 / examples

Example packages for ROS 2
Apache License 2.0
681 stars 308 forks source link

`check_multiple_nodes_launch_test` test flaky on Windows #340

Closed Blast545 closed 1 year ago

Blast545 commented 2 years ago

Bug report

Required Info:

Steps to reproduce issue

This is a flaky test. It would fail sometimes when running the buildfarm jobs. https://ci.ros2.org/view/nightly/job/nightly_win_deb/2331/testReport/junit/launch_testing_examples.launch_testing_examples/check_multiple_nodes_launch_test/launch_testing_examples_check_multiple_nodes_launch_test/

https://github.com/ros2/examples/blob/master/launch_testing/launch_testing_examples/launch_testing_examples/check_multiple_nodes_launch_test.py

There's a "TO-DO" that may require being addressed: https://github.com/ros2/examples/blob/7d6f14a55dd3d6498fd7a8d5a90cd482ec71970c/launch_testing/launch_testing_examples/launch_testing_examples/check_multiple_nodes_launch_test.py#L83

Expected behavior

Test would pass.

Actual behavior

Test fails.

Additional information

This error failed for the first time in the nightly_win_deb farm: https://ci.ros2.org/view/nightly/job/nightly_win_deb/2272/ and has failed 6times counting that.

This is the first time the error showed in the repeated jobs: https://ci.ros2.org/view/nightly/job/nightly_win_rep/2505 and has failed 46 times since then. Pretty reliable in the repeated jobs.

sloretz commented 2 years ago

I'm unable to reproduce this issue on Windows.

Crola1702 commented 1 year ago

It appeared again today: https://ci.ros2.org/view/nightly/job/nightly_win_deb/2525/

Flakiness ratio:

Log output:

``` FAIL: launch_testing_examples.check_multiple_nodes_launch_test.CheckMultipleNodesLaunched.test_node_does_not_exist ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Traceback (most recent call last): File "C:\ci\ws\src\ros2\examples\launch_testing\launch_testing_examples\launch_testing_examples\check_multiple_nodes_launch_test.py", line 73, in test_node_does_not_exist assert wait_for_nodes_1.get_nodes_not_found() == {'invalid_node'} AssertionError: assert {'demo_node_0...invalid_node'} == {'invalid_node'} Extra items in the left set: 'demo_node_0' Use -v to get more diff ```
clalancette commented 1 year ago

@Crola1702 Given that I believe that this is fixed, I'm going to close this out. If it happens again, please feel free to reopen and I'll take another look.

Blast545 commented 1 year ago

@Crola1702 Is this the one related to a hanging job in the affected nodes? would you please paste here the internal issue tracking that?

Crola1702 commented 1 year ago

Well... It happened today again in Nightly Windows Release 2489

Error message

launch_testing_examples.check_multiple_nodes_launch_test.CheckMultipleNodesLaunched.test_nodes_successful failed

Stacktrace

FAIL: launch_testing_examples.check_multiple_nodes_launch_test.CheckMultipleNodesLaunched.test_nodes_successful
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\ci\ws\src\ros2\examples\launch_testing\launch_testing_examples\launch_testing_examples\check_multiple_nodes_launch_test.py", line 57, in test_nodes_successful
    assert wait_for_nodes_1.wait()
AssertionError: assert False
 +  where False = <bound method WaitForNodes.wait of <launch_testing_examples.check_multiple_nodes_launch_test.WaitForNodes object at 0x000001F3A9939CD0>>()
 +    where <bound method WaitForNodes.wait of <launch_testing_examples.check_multiple_nodes_launch_test.WaitForNodes object at 0x000001F3A9939CD0>> = <launch_testing_examples.check_multiple_nodes_launch_test.WaitForNodes object at 0x000001F3A9939CD0>.wait

Stderr call:

---------------------------- Captured stderr call -----------------------------
test_node_does_not_exist (launch_testing_examples.check_multiple_nodes_launch_test.CheckMultipleNodesLaunched)
Insert a invalid node name that should not exist. ... ok
test_nodes_successful (launch_testing_examples.check_multiple_nodes_launch_test.CheckMultipleNodesLaunched)
Check if all the nodes were launched correctly. ... FAIL

Last 10 times it happened:

Link Datetime
https://ci.ros2.org/view/nightly/job/nightly_win_rel/2489 2022-11-10 23:00:01
https://ci.ros2.org/view/nightly/job/nightly_win_rep/2796 2022-11-05 23:00:01
https://ci.ros2.org/view/nightly/job/nightly_win_rep/2795 2022-11-04 23:00:01
https://ci.ros2.org/view/nightly/job/nightly_win_rep/2793 2022-11-02 23:00:01
https://ci.ros2.org/view/nightly/job/nightly_win_rep/2791 2022-11-01 04:00:01
https://build.ros2.org/view/Hci/job/Hci__nightly-connext_ubuntu_jammy_amd64/59/ 2022-10-30 06:15:02
https://ci.ros2.org/view/nightly/job/nightly_win_rep/2788 2022-10-29 04:00:01
https://ci.ros2.org/view/nightly/job/nightly_win_rep/2786 2022-10-27 04:00:01
https://ci.ros2.org/view/nightly/job/nightly_win_deb/2525 2022-10-25 04:00:01
https://ci.ros2.org/view/nightly/job/nightly_win_rep/2784 2022-10-25 04:00:01

FYI: @clalancette

Blast545 commented 1 year ago

In the case of https://ci.ros2.org/view/nightly/job/nightly_win_deb/2525, this error appeared because there was a hanging node in the agent causing this problem to appear.

So, it seems this errors shows up once/twice every two weeks in the nightly_win_rel / nightly_win_deb jobs. However, it's showing up frequently in the repeated jobs.

Crola1702 commented 1 year ago

This happened today in Windows Debug

Line of code failing: check_multiple_nodes_launch_test.py#L57

Reference build: https://ci.ros2.org/view/nightly/job/nightly_win_deb/2603/

Log output:

FAIL: launch_testing_examples.check_multiple_nodes_launch_test.CheckMultipleNodesLaunched.test_nodes_successful
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\ci\ws\src\ros2\examples\launch_testing\launch_testing_examples\launch_testing_examples\check_multiple_nodes_launch_test.py", line 57, in test_nodes_successful
    assert wait_for_nodes_1.wait()
AssertionError: assert False
 +  where False = <bound method WaitForNodes.wait of <launch_testing_examples.check_multiple_nodes_launch_test.WaitForNodes object at 0x0000010A9E433780>>()
 +    where <bound method WaitForNodes.wait of <launch_testing_examples.check_multiple_nodes_launch_test.WaitForNodes object at 0x0000010A9E433780>> = <launch_testing_examples.check_multiple_nodes_launch_test.WaitForNodes object at 0x0000010A9E433780>.wait
Crola1702 commented 1 year ago

Today we had a different output for Windows debug:

Reference build: https://ci.ros2.org/view/nightly/job/nightly_win_deb/2617/

================================================================================================================================================================================================================================================================
FAIL: launch_testing_examples.check_multiple_nodes_launch_test.CheckMultipleNodesLaunched.test_node_does_not_exist
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\ci\ws\src\ros2\examples\launch_testing\launch_testing_examples\launch_testing_examples\check_multiple_nodes_launch_test.py", line 73, in test_node_does_not_exist
    assert wait_for_nodes_1.get_nodes_not_found() == {'invalid_node'}
AssertionError: assert {'demo_node_2...invalid_node'} == {'invalid_node'}
  Extra items in the left set:
  'demo_node_2'
  Use -v to get more diff

================================================================================================================================================================================================================================================================
FAIL: launch_testing_examples.check_multiple_nodes_launch_test.CheckMultipleNodesLaunched.test_nodes_successful
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\ci\ws\src\ros2\examples\launch_testing\launch_testing_examples\launch_testing_examples\check_multiple_nodes_launch_test.py", line 57, in test_nodes_successful
    assert wait_for_nodes_1.wait()
AssertionError: assert False
 +  where False = <bound method WaitForNodes.wait of <launch_testing_examples.check_multiple_nodes_launch_test.WaitForNodes object at 0x0000021A479A51E0>>()
 +    where <bound method WaitForNodes.wait of <launch_testing_examples.check_multiple_nodes_launch_test.WaitForNodes object at 0x0000021A479A51E0>> = <launch_testing_examples.check_multiple_nodes_launch_test.WaitForNodes object at 0x0000021A479A51E0>.wait
Blast545 commented 1 year ago

Isn't the extra demo_node_2 related to a problem with the host, the problem where a node remains open after an error in a job?

clalancette commented 1 year ago

This was theoretically solved by #355 . I'm going to close this out, but if this reoccurs please feel free to reopen.