ros2 / sros2

tools to generate and distribute keys for SROS 2
Apache License 2.0
89 stars 44 forks source link

👨‍🌾 test_generate_policy and test_generate_service flaky on macOS on foxy #251

Closed azeey closed 2 years ago

azeey commented 3 years ago

Bug report

Required Info:

Steps to reproduce issue

test_generate_policy or test_generate_service tests fail consistently if ros2 daemon is running. On CI machines, the the tests are flaky, but fail most of the time (eg. history on Focal)

Expected behavior

All tests pass.

Actual behavior

test_generate_policy_topics fails with

test/sros2/commands/security/verbs/test_generate_policy.py:44: in test_generate_policy_topics
    assert cli.main(
E   AssertionError: assert 1 == 0
E    +  where 1 = <function main at 0x7f2914956a60>(argv=['security', 'generate_policy', '/tmp/tmpaxpo04wh/test-policy.xml'])
E    +    where <function main at 0x7f2914956a60> = cli.main
----------------------------- Captured stderr call -----------------------------
No nodes detected in the ROS graph. No policy file was generated.

and test_generate_policy_services fails with

test/sros2/commands/security/verbs/test_generate_policy.py:96: in test_generate_policy_services
    assert cli.main(
E   AssertionError: assert 1 == 0
E    +  where 1 = <function main at 0x7f2914956a60>(argv=['security', 'generate_policy', '/tmp/tmp2k4qonwa/test-policy.xml'])
E    +    where <function main at 0x7f2914956a60> = cli.main
----------------------------- Captured stderr call -----------------------------
No nodes detected in the ROS graph. No policy file was generated.

Additional information

Adding a small sleep right before calling cli.main in both tests fixes the tests for me locally. Alternatively, passing --no-daemon to generate_policy fixes the issue as well, but I'm not sure which one, if either, is a good option.

mikaelarguedas commented 3 years ago

Thanks for reporting, the test only fails in foxy and not in Nightly or Rolling jobs is that correct?

There was an oberhaul of the tests structure to remove the use of the daemon and make these tests less flaky https://github.com/ros2/sros2/pull/214, it was not backported to foxy because it was not considered a critical bug as this had no impact on the resulting debs.

nuclearsandwich commented 3 years ago

Thanks for reporting, the test only fails in foxy and not in Nightly or Rolling jobs is that correct?

It seems like this issue may have combined two separate sets of failures. The issue described is present in Foxy on Linux. The recent nightly and rolling builds don't seem to exhibit this failure.

it was not backported to foxy because it was not considered a critical bug as this had no impact on the resulting debs.

If it's a non-breaking change that can be backported doing so would increase the confidence when testing and producing Foxy patch releases.


Separately on macOS with rmw_connext_cpp there has been an sros2 test failure in two of the last three nightlies

I can open a separate issue for that if preferred.

nuclearsandwich commented 3 years ago

sros2.test.sros2.commands.security.verbs.test_generate_policy_no_nodes.test_generate_policy_no_nodes has also failed on 3 out of 4 of the most recent Windows Debug nightlies.

tfoote commented 3 years ago

This continues to be an issue on Windows debug: https://ci.ros2.org/job/nightly_win_deb/1921/testReport/sros2.test.sros2.commands.security.verbs/test_generate_policy_no_nodes/test_generate_policy_no_nodes/history/

Screenshot from 2021-03-09 17-48-55

And it's also occurring on Linux foxy CI: https://build.ros2.org/job/Fci__nightly-fastrtps_ubuntu_focal_amd64/256/testReport/sros2.test.sros2.commands.security.verbs/test_generate_policy/test_generate_policy_topics/history/

Screenshot from 2021-03-09 17-49-59

mikaelarguedas commented 3 years ago

Foxy flakyness:

254 is a backport of the testing overhaul to the foxy branch. Could you let us know if this addresses the issues noticed on Foxy?


I can open a separate issue for that if preferred.

Yes please let's open another issue for the nightly flakiness so that we dont mix the code versions to modify.

Separately on macOS with rmw_connext_cpp there has been an sros2 test failure in two of the last three nightlies

https://ci.ros2.org/view/nightly/job/nightly_osx_release/1981/testReport/junit/ros2param.ros2param.test/test_verb_list/test_verb_list/
https://ci.ros2.org/view/nightly/job/nightly_osx_release/1983/testReport/junit/ros2param.ros2param.test/test_verb_list/test_verb_list/

Wrong links ? these seem to be for ros2param

These tests have been causing issues for a long time as they rely on the ROS Graph to be clean. This seemed to be improved after #214 but from what I read here there seem to still be an issue on Windows Debug. Are there processes not terminating properly on these jobs ?

This continues to be an issue on Windows debug

These look like hanging issues. From https://github.com/ros2/sros2/pull/214#issuecomment-644382734 and looking at the job output, seems that most(/all?) ros2 cli related tests are disabled on windows. So maybe it's an issue from the tooling below the code in this repository? Should all ros2cli based tests just be disabled for Windows?

mikaelarguedas commented 3 years ago

@azeey did you have a chance to confirm if #254 fixes this issue ?

ivanpauno commented 3 years ago

There's a similar issue with the new rmw_connextdds, though the root cause might be different:

https://ci.ros2.org/view/nightly/job/nightly_linux_repeated/2249/testReport/sros2.test.sros2.commands.security.verbs/test_generate_policy/test_generate_policy/

hidmic commented 3 years ago

On that note, https://github.com/ros2/sros2/pull/260 may help. Launch related hangings may also have been solved in Galactic since https://github.com/ros2/launch/pull/476. We haven't re-enabled tests on Windows though (still too flaky), so take it with a grain of salt.

jacobperron commented 2 years ago

I've merged #254. But also, macOS is no longer a supported platform for Foxy. So either way, I think we can close this ticket.