zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.68k stars 6.53k forks source link

npcx9m6f_evb: tests/kernel/sleep/kernel.common.timing and tests/kernel/sleep/kernel.common.timing.minimallibc/ failing #66185

Open pkunieck opened 10 months ago

pkunieck commented 10 months ago

Describe the bug Test from tests/kernel/sleep/kernel.common.timing failing with timeout

To Reproduce Steps to reproduce the behavior:

  1. cd <path/to>zephyrproject/zephyr
  2. source zephyr-env.sh
  3. twister --device-testing --hardware-map <path/to>/map.yaml -T tests/kernel/sleep/ -vv

Expected behavior Test pass

Impact annoyance

Logs and console output START - test_sleep Kernel objects initialized Test thread started: id = 0x200c0918 Helper thread started: id = 0x200c0860 Testing normal expiration of k_sleep()

Environment (please complete the following information):

henrikbrixandersen commented 10 months ago
  • Toolchain Zephyr SDK 16.1

Have you tried a more recent SDK?

  • tested on the newest, 100% reproduction on the newest commits

Please state which SHA, you have reproduced this on as "newest commits" can have many different meanings.

pkunieck commented 10 months ago

I checked and issue also reproduced on 16.4 some time ago, I edited it. We currently can't reproduce this, I'll give an update after we run more tests.

pkunieck commented 10 months ago

We have reproduction at zephyr version: zephyr-v3.5.0-2951-gbd227e19cff9

ChiHuaL commented 9 months ago

Hi,
I test the version bd227e19cff9 and I cannot reproduce the issue. The following is my environment setting:

Attach the test result log below:

*** Booting Zephyr OS build zephyr-v3.5.0-2951-gbd227e19cff9 ***
Running TESTSUITE sleep
===================================================================
START - test_sleep
Kernel objects initialized
Test thread started: id = 0x200c0918
Helper thread started: id = 0x200c0860
Testing normal expiration of k_sleep()
Testing: test thread sleep + helper thread wakeup test
Testing: test thread sleep + isr offload wakeup test
Testing: test thread sleep + main wakeup test thread
Testing kernel k_sleep()
 PASS - test_sleep in 2.032 seconds
===================================================================
START - test_sleep_forever
Kernel objects initialized
 PASS - test_sleep_forever in 0.004 seconds
===================================================================
START - test_usleep
elapsed_ms = 1000
 PASS - test_usleep in 1.002 seconds
===================================================================
TESTSUITE sleep succeeded

------ TESTSUITE SUMMARY START ------

SUITE PASS - 100.00% [sleep]: pass = 3, fail = 0, skip = 0, total = 3 duration = 3.038 seconds
 - PASS - [sleep.test_sleep] duration = 2.032 seconds
 - PASS - [sleep.test_sleep_forever] duration = 0.004 seconds
 - PASS - [sleep.test_usleep] duration = 1.002 seconds

------ TESTSUITE SUMMARY END ------

===================================================================
PROJECT EXECUTION SUCCESSFUL

Is it 100% reproduced ? Could you share the test log for us to check?

github-actions[bot] commented 7 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

gbarkadiusz commented 7 months ago

@ChiHuaL Zephyr sdk: zephyr-sdk-0.16.5 Zephyr Version: v3.6.0-22-g074152a03db9 OS: Linux

tests/kernel/sleep/kernel.common.timing             FAILED Timeout (device: D30CC4MH, 63.707s)
INFO    - <path_to>/twister-out/npcx9m6f_evb/tests/kernel/sleep/kernel.common.timing/handler.log
*** Booting Zephyr OS build v3.6.0-22-g074152a03db9 (delayed boot 500ms) ***
Running TESTSUITE sleep
===================================================================
START - test_sleep
pkunieck commented 6 months ago

I checked in our test environment and we have about 58% of reproduction rate of these tests. I checked results in a one month time span(28.02-28.03)

pkunieck commented 6 months ago

This is log I got when I flashed tests manually using west build and flash:

*** Booting Zephyr OS build v3.6.0-2198-gcd685bdce82d ***
===================================================================
Running TESTSUITE sleep
START - test_sleep
FAIL - test_thread@119.  *** k_sleep() slept for 1003 ticks not 1001.

I run that 30 times, and got only 2 reproduction. Twister does not give us this message, only timeout and repro rate is also poor. Due to the fact that repro rate is so small, trying to use bisect would take hours and hours. I hope that this log will help.

pkunieck commented 5 months ago

Can we have an update? Do we need more information?

ChiHuaL commented 5 months ago

Hi,
Sorry for the late reply. I am unable to reproduce this symptom over 150 iterations with the following shell script:

#!/bin/bash                                                                                          

for i in {1..150}                                                                                    
do                                                                                                   
        echo "Iteration: $i"                                                                                                                                                                                                                  
        west flash                                                                                   
        sleep 6                                                                                      
done                           

The console log is attached zephyr_test_kernel_sleep_npcx9m6f_evb.log. OS: Linux

Toolchain Zephyr SDK 16.1

Version: https://github.com/zephyrproject-rtos/zephyr/commit/3d76e498e9475b67da824247d1fce665f84d0dbc

I have no idea why it happened at your side. Do you have any chance to change another npcx9m6f_evb for the test?

pkunieck commented 4 months ago

Okay so I checked this on commit that was used here and also on commit i was using, and I still got reproductions. I got one different output, but issue is still visible. Can you send exact model of board you are using? Maybe there are some differences, with older version of npcx9m6f? test_3d76e498e947.log test_cd685bdce82d.log

pkunieck commented 4 months ago

So i checked this issue on older board and I don't have reproduction on it. We have 2 boards in validation, one in CI one for debug, they are newer models, and we got repro on them. This is numbers from stickers from one that we got repro on: L5533004C142-129 And numbers from chip: nuvoton NPCX997FA0B E121M0092

And these are numbers from board that I run 300 iterations and 100% pass: L5533004C815-78 And chip: NPCX997FA0B E105M0022 (there was also "R3" sticker on it and numbers were a little scratched after removal)

test_older_npcx2.log test_older_npcx.log

pkunieck commented 3 months ago

Hi, I also have some conclusions, about that timeout. Usually, twister got timeout because log is not finished, but there was one iteration when test failed and we got a full log from that test. Maybe this will be helpful.

*** Booting Zephyr OS build v3.6.0-2198-gcd685bdce82d ***
Running TESTSUITE sleep
===================================================================
START - test_sleep

    Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/sleep/src/main.c:234: sleep_test_sleep: (sleep_time_valid(start_tick, end_tick, ONE_SECOND_ALIGNED) is false)
k_sleep() slept for 1003 ticks, not 1001

 FAIL - test_sleep in 2.029 seconds
===================================================================
START - test_sleep_forever
 PASS - test_sleep_forever in 0.002 seconds
===================================================================
START - test_usleep
 PASS - test_usleep in 1.002 seconds
===================================================================
TESTSUITE sleep failed.

------ TESTSUITE SUMMARY START ------

SUITE FAIL -  66.67% [sleep]: pass = 2, fail = 1, skip = 0, total = 3 duration = 3.033 seconds
 - FAIL - [sleep.test_sleep] duration = 2.029 seconds
 - PASS - [sleep.test_sleep_forever] duration = 0.002 seconds
 - PASS - [sleep.test_usleep] duration = 1.002 seconds

------ TESTSUITE SUMMARY END ------

===================================================================
PROJECT EXECUTION FAILED
ChiHuaL commented 3 months ago

So i checked this issue on older board and I don't have reproduction on it. We have 2 boards in validation, one in CI one for debug, they are newer models, and we got repro on them. This is numbers from stickers from one that we got repro on: L5533004C142-129 And numbers from chip: nuvoton NPCX997FA0B E121M0092

And these are numbers from board that I run 300 iterations and 100% pass: L5533004C815-78 And chip: NPCX997FA0B E105M0022 (there was also "R3" sticker on it and numbers were a little scratched after removal)

test_older_npcx2.log test_older_npcx.log

May I know where do you get those npcx EVBs, so we might check what is the difference?

SzymonRichert commented 3 months ago

As I understand npcx EVBs is just normal board. EVB stands for Early Validation Board, Intel is ordering for internal use.

github-actions[bot] commented 1 month ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

pkunieck commented 1 week ago

We recently connected and started testing another Nuvoton board: NPCX4M8F, and the same errors are also being reproduced on it. Can we get an update regarding the newer board? handler.log from this new board:

***** delaying boot 500ms (per build configuration) *****
*** Booting Zephyr OS build v3.7.0-3833-g715b97397b79 (delayed boot 500ms) ***
Running TESTSUITE sleep
===================================================================
START - test_sleep
pkunieck commented 23 hours ago

Any updates about that issue on 2 platforms?