nashif commented 8 years ago

Reported by Peter Mitsis:

While running Jenkins on a unrelated change, the test_nano_work kernel test failed on a qemu_cortex_m3 board. See the failure output below.

Attempts to duplicate this failure in my local repo have thus far proven unsuccessful. Consequently, I am filing this Jira defect so that the (?sporadic?) failure can be tracked, and hopefully found+fixed.

Additional information can be found at https://jenkins.zephyrproject.org/job/zephyr-verify/7730/

Failure Output

qemu_cortex_m3 tests/kernel/test_nano_work/test FAILED: failed ------------------unified/qemu_cortex_m3/tests/kernel/test_nano_work/test/qemu.log------------------ Starting sequence test

Initializing test items
Submitting test items
Submitting work 1 from task
Running test item 1
Submitting work 2 from fiber
Submitting work 3 from task
Submitting work 4 from fiber
Running test item 2
Submitting work 6 from fiber
Submitting work 5 from task
Waiting for work to finish
Running test item 3
Running test item 4
Running test item 6
Running test item 5
Checking results FAIL - check_results@113. *** got result 6 in position 4 (expected 5)

FAIL - main.

PROJECT EXECUTION FAILED

(Imported from Jira ZEP-1078)

nashif commented 7 years ago

by Benjamin Walsh:

The test uses timing for coordination, which is always a bit fishy IMHO, but there is enough ticks between each event, which should be enough to ensure correct behaviour:

tick   0 1 2 3 4 5 6 7 8 9 10 11 12 later
work   1                    2        3,4,5,6  dequeuing
fiber      2         4            6           queuing
task   1         3          5                 queuing

There is enough time between each operation (two ticks), except when dequeuing 2 and enqueuing 5, but the order should not matter.

It seems the problem could appear if there is enough drift that enqueuing 5 and 6 occurred on the same tick: in that case, 6 would be enqueued before 5 since the fiber enqueues 6.

It should be possible to fix this by using semaphores for coordination; however this might be bringing a real problem to the surface. Double-however: this is QEMU for Cortex-M3, which seems to be prone to weird timing issues, possibly host-related.

nashif commented 7 years ago

by Mark Linkmeyer:

Fixing incorrect priority

zephyriot / zephyr-issues

Failure in test_nano_work #980

Failure Output

Checking results FAIL - check_results@113. *** got result 6 in position 4 (expected 5)

FAIL - main.