Open glarsennordic opened 3 months ago
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.
Describe the bug
The following twister command fails approximately 3% of the time when executed on my machine (detailed below):
./scripts/twister --ninja --inline-logs --overflow-as-errors -T tests/net/conn_mgr_monitor
When it fails, the platform which failed is always
qemu_cortex_a9
.Bafflingly, if we restrict the twister command to just this platform, the 3% error rate dissapears:
./scripts/twister --ninja --inline-logs --overflow-as-errors -p qemu_cortex_a9 -T tests/net/conn_mgr_monitor
The error experienced 3% of the time is usually due to either too few or too many NET Management events being generated.
(NOTE: The logs here indicate an older Zephyr commit hash than latest; This is because these logs are copied from some tests where I tried older commits. But this appears to affect all commits of Zephyr since I introduced this test suite)
This strongly suggests some kind of bug with our QEMU simulation environment, but to be frank I'm at a loss as to what that could possibly be. I've tried maximizing the delays between event triggers in these tests and the event verifiers to give events maximal chances of settling, but to no avail.
I cannot fathom why, 3% of the time, unexpected events get triggered, or events which are expected are not triggered, regardless of delay, but ONLY if I also execute tests for other platforms. I suspect that network state from prior QEMU test executions might be affecting the initial network state for
qemu_cortex_a9
.To Reproduce Clone and west update the latest Zephyr. Or use
76559f27fd6e9219516c9ee7deebbdf5b3116105
for my exact environment. From the zephyr root directory, execute the following command (on linux):for i in {1..100}; do rm -r twister-out*; ./scripts/twister --ninja --inline-logs --overflow-as-errors -T tests/net/conn_mgr_monitor done
There is a:
Expected behavior I would expect this test to succeed 100% of the time, instead of 97% of the time. I would also expect whether or not this test fails on
qemu_cortex_a9
to not depend on whether other platforms are enabled too.Impact Largely, this is an annoyance. But I find the inconsistency with how
qemu_cortex_a9
behaves somewhat concerning. It suggests something might be wrong with QEMU.Environment:
I am using Zephyr SDK 0.16.5 (
zephyr-sdk-0.16.5-1_linux-x86_64.tar.xz
)