zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.48k stars 6.41k forks source link

tests: nrf: posix: portability.posix.common.tls.newlib fails on nrf9160dk_nrf9160 #31721

Closed PerMac closed 3 years ago

PerMac commented 3 years ago

Describe the bug The test portability.posix.common.tls.newlib from tests/posix/common/ fails on nrf9160dk_nrf9160

To Reproduce Steps to reproduce the behavior:

  1. Have nrf9160dk connected
  2. call ./scripts/twister --device-testing -T tests/posix/common/ -p nrf9160dk_nrf9160 --device-serial /dev/ttyACM0 -v -v
  3. See error

Expected behavior The test passes

Impact Not clear

Logs and console output The part containing FAIL status:

START - test_posix_realtime
POSIX clock set APIs
Assertion failed at WEST_TOPDIR/zephyr/tests/posix/common/src/clock.c:75: test_posix_realtime: (rts.tv_nsec not equal to mts.tv_nsec)
Nanoseconds not equal
FAIL - test_posix_realtime

Environment (please complete the following information):

ioannisg commented 3 years ago

will take a look

PerMac commented 3 years ago

I think this test is unstable in fact. It eventually passes after some retries

ioannisg commented 3 years ago

I can't reproduce this @PerMac , are you sure you keep seeing the error? I am on Zephyr v-2-5-0 -rc2

ioannisg commented 3 years ago

My suggestion is to close this ticket.

ioannisg commented 3 years ago

I think this test is unstable in fact. It eventually passes after some retries

It does not fail for me (but i only tested gnu arm embedded)

PerMac commented 3 years ago

I still see this with v-2-5-0 -rc2 Sometimes some scenarios from this test pass, but most of the time at least one will fail and it is always an issue with realtime test case. I use Zephyr SDK 0.12.2

pabigot commented 3 years ago

I've run the reproducing command on v2.5.0-rc2-8-gcf946d3365 five times and all tests pass on my rev 0.8.5 nrf9160dk_nrf9160. Perhaps it's device-specific (bad crystal)? Also using SDK 0.12.2.

PerMac commented 3 years ago

I will close the ticket then. Might be as you are suggesting Peter, that it is a particular board issue (if it matters I have 0.9.0 on my desk).

PerMac commented 3 years ago

@pabigot I reopened the issue as this test start failing again recently. It works with v2.5.0. The last time I was able to bisect with reverted logic to find that https://github.com/zephyrproject-rtos/zephyr/commit/544475d8a709f42926bd4059c3a1a4f78d023676 was the commit which (unnoticeably) fixed the issue previously

ioannisg commented 3 years ago

so is this a regression then?

pabigot commented 3 years ago

so is this a regression then?

Possibly, or it was just chance that it passed in https://github.com/zephyrproject-rtos/zephyr/issues/31721#issuecomment-772591585. It does fail for me today; I'll see if I can bisect.

PerMac commented 3 years ago

Yes, I believe it is regression. I tried to bisect where the issue is introduced but had no success. The test already was unstable which makes bisecting by hand a nightmare. However, I think we can narrow down the scope to verify: The test passed in our internal CI for zephyr-v2.5.0-268-g3fe33 (how after a 1 retry) Then for zephyr-v2.5.0-441-g5de3f and later on it continues to fail, retries did not help. Our CI is set to do 2 retries

PerMac commented 3 years ago

I don't think so @pabigot Please check my above comments. To recap: I was able to find that the last time the test was actually fixed by https://github.com/zephyrproject-rtos/zephyr/commit/544475d8a709f42926bd4059c3a1a4f78d023676 It worked (with retires) up to (including) zephyr-v2.5.0-268-g3fe33 It cannot pass with retries since zephyr-v2.5.0-441-g5de3f

pabigot commented 3 years ago

I hit a failure at zephyr-v2.5.0-108-g84e4e62c2db5 so I think the problem may have been reintroduced earlier. I'll report back when I know more.

pabigot commented 3 years ago

My bisect between v2.5.0-rc2-8-gcf946d3365 and zephyr-v2.5.0-693-gd19741f1ec located the failure at zephyr-v2.5.0-7-g91946ef21c.

zephyr-v2.5.0-6-gdd4322154067 passed ten (10) reps. zephyr-v2.5.0-6-gdd4322154067 also passed thirty (30) reps (total 40)

zephyr-v2.5.0-7-g91946ef21c fails quickly.

zephyr-v2.5.0-268-g3fe33 failed at rep 8 so the problem was visible there, even if it wasn't revealed in local testing.

All this with SDK 0.12.3.

zephyr-v2.5.0-7-g91946ef21c (91946ef21c4a1653be862dbaeae90c55aeadd932) is not supposed to change any behavior, so this may be due to something as subtle as code placement and caching.

I've got a nice script that does the necessary testing for a generic twister failure at a particular commit. It could probably be used with git bisect run to automate the narrowing down process. I'll clean it up and get it posted somewhere.

ioannisg commented 3 years ago

This does not really seem to fail; I 've tested it with 2.6.0-RC1 and with current master. Closing for now, @PerMac pls, reopen if you test this and it fails for nrf9160.

PerMac commented 3 years ago

after patching nrf53 with #35455 I see these tests failing most of the time on nrf5340dk_nrf5340_cpuappns. Haven't seen it failed on not ns version yet. I created a separate issue for timer-related instability in tests #35509.