zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.64k stars 6.52k forks source link

kernel: common: stack_protection_armv8m_mpu_stack_guard test Failed #62888

Open hakehuang opened 1 year ago

hakehuang commented 1 year ago

Describe the bug stack_protection_armv8m_mpu_stack_guard test is Failed on zephyr-v3.4.0-3926-g4ba97c22555e on mimxrt595_evk_cm33 testcase path is kernel/common/stack_protection_armv8m_mpu_stack_guard/fatal

see logs for details

To Reproduce

  1. scripts/twister --device-testing --device-serial /dev/ttyACM0 -p mimxrt595_evk_cm33  --sub-test kernel.common

    or

    # cd tests/kernel/common/stack_protection_armv8m_mpu_stack_guard/fatal
    # west build -b mimxrt595_evk_cm33
    # west flash
  2. See error

Expected behavior test pass

Impact

Logs and console output

*** Booting Zephyr OS build zephyr-v3.4.0-3926-g4ba97c22555e ***
E: no free partition slots available
E: no free partition slots available
Running TESTSUITE fatal_exception
===================================================================
START - test_fatal
test alt thread 1: generic CPU exception
E: ***** USAGE FAULT *****
E:   Illegal use of the EPSR
E: r0/a1:  0x00000000  r1/a2:  0x00000000  r2/a3:  0x30180ec0
E: r3/a4:  0x3018279c r12/ip:  0x7fdb7dfd r14/lr:  0x18001993
E:  xpsr:  0x40000000
E: Faulting instruction address (r15/pc): 0x3018279c
E: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
E: Current thread: 0x30180f00 (unknown)
Caught system error -- reason 0
test alt thread 1: generic CPU exception divide zero
E: r0/a1:  0x00000000  r1/a2:  0x00000000  r2/a3:  0x00000000
E: r3/a4:  0x30180ec0 r12/ip:  0x00000000 r14/lr:  0x180020bb
E:  xpsr:  0x41000000
E: Faulting instruction address (r15/pc): 0x180019b0
E: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
E: Current thread: 0x30180f00 (unknown)
Caught system error -- reason 0
test alt thread 2: initiate kernel oops
E: r0/a1:  0x00000003  r1/a2:  0x00000000  r2/a3:  0x00000003
E: r3/a4:  0x30180ec0 r12/ip:  0x180019b0 r14/lr:  0x180020bb
E:  xpsr:  0x41000000
E: Faulting instruction address (r15/pc): 0x180019d4
E: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
E: Current thread: 0x30180f00 (unknown)
Caught system error -- reason 3
test alt thread 3: initiate kernel panic
E: r0/a1:  0x00000004  r1/a2:  0x00000000  r2/a3:  0x00000004
E: r3/a4:  0x30180ec0 r12/ip:  0x30180f00 r14/lr:  0x180020bb
E:  xpsr:  0x41000000
E: Faulting instruction address (r15/pc): 0x18001a18
E: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
E: Current thread: 0x30180f00 (unknown)
Caught system error -- reason 4
test alt thread 4: fail assertion
ASSERTION FAIL [0] @ WEST_TOPDIR/zephyr/tests/kernel/fatal/exception/src/main.c:155
intentionally failed assertion
E: r0/a1:  0x00000004  r1/a2:  0x0000009b  r2/a3:  0x00000016
E: r3/a4:  0x00000000 r12/ip:  0x30180f00 r14/lr:  0x1800e0cd
E:  xpsr:  0x61000000
E: Faulting instruction address (r15/pc): 0x1800e0e6
E: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
E: Current thread: 0x30180f00 (unknown)
Caught system error -- reason 4
test alt thread 5: initiate arbitrary SW exception
E: r0/a1:  0x7fffffff  r1/a2:  0x00000000  r2/a3:  0x7fffffff
E: r3/a4:  0x30180ec0 r12/ip:  0x30181454 r14/lr:  0x180020bb
E:  xpsr:  0x41000000
E: Faulting instruction address (r15/pc): 0x18001a5e
E: >>> ZEPHYR FATAL ERROR 2147483647: Unknown error on CPU 0
E: Current thread: 0x30180f00 (unknown)
Caught system error -- reason 2147483647
test alt thread 6: initiate arbitrary SW exception negative
E: r0/a1:  0xfffffffe  r1/a2:  0x00000000  r2/a3:  0xfffffffe
E: r3/a4:  0x30180ec0 r12/ip:  0x30180f00 r14/lr:  0x180020bb
E:  xpsr:  0x41000000
E: Faulting instruction address (r15/pc): 0x18001aa2
E: >>> ZEPHYR FATAL ERROR -2: Unknown error on CPU 0
E: Current thread: 0x30180f00 (unknown)
Caught system error -- reason -2
test stack HW-based overflow - supervisor 1
E: ***** MPU FAULT *****
E:   Stacking error (context area might be not valid)
E:   Data Access Violation
E:   MMFAR Address: 0x30181ffc
E: r0/a1:  0x6fffffbf  r1/a2:  0xdfffffff  r2/a3:  0x7bf7dfff
E: r3/a4:  0xffef77bb r12/ip:  0x7bffdfff r14/lr:  0xfffdfbef
E:  xpsr:  0x41000000
E: Faulting instruction address (r15/pc): 0x1800dc72
E: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
E: Current thread: 0x30180f00 (unknown)
Caught system error -- reason 2
test stack HW-based overflow - supervisor 2
E: ***** MPU FAULT *****
E:   Stacking error (context area might be not valid)
E:   Data Access Violation
E:   MMFAR Address: 0x30181ffc
E: r0/a1:  0x6fffffbf  r1/a2:  0xdfffffff  r2/a3:  0x7bf7dfff
E: r3/a4:  0xffef77bb r12/ip:  0x7bffdfff r14/lr:  0xfffdfbef
E:  xpsr:  0x41000000
E: Faulting instruction address (r15/pc): 0x1800dc72
E: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
E: Current thread: 0x30180f00 (unknown)
Caught system error -- reason 2
test stack HW-based overflow - user 1
E: ***** MPU FAULT *****
E:   Data Access Violation
E:   MMFAR Address: 0x30180ec0
E: r0/a1:  0x00000025  r1/a2:  0x00000000  r2/a3:  0x00000002
E: r3/a4:  0x30180ec0 r12/ip:  0x180020a3 r14/lr:  0x18001c65
E:  xpsr:  0x01000000
E: Faulting instruction address (r15/pc): 0x18001c52
E: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
E: Current thread: 0x30180f00 (unknown)
Caught system error -- reason 0
Was not expecting a crash
PROJECT EXECUTION FAILED

Environment (please complete the following information):

hakehuang commented 1 year ago
Also fails on mimxrt595_evk_cm33 for zephyr-v3.4.0-3926-g4ba97c22555e
hakehuang commented 1 year ago
Also fails on mimxrt685_evk_cm33 for zephyr-v3.4.0-3926-g4ba97c22555e
hakehuang commented 1 year ago

related: #62885

github-actions[bot] commented 10 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

github-actions[bot] commented 8 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

decsny commented 7 months ago

bisect shows it broke on f0daf904bb0202c9247cdcf4d1186e0762a9db67 @keith-packard

keith-packard commented 7 months ago

I'd try disabling thread local storage (that requires building picolibc as a module); the changes required for that are more extensive than switching C libraries. Oh, and try to reproduce on main as well; that switches to the same malloc implementation as the minimal C library, further reducing differences.

decsny commented 7 months ago

I'd try disabling thread local storage (that requires building picolibc as a module); the changes required for that are more extensive than switching C libraries. Oh, and try to reproduce on main as well; that switches to the same malloc implementation as the minimal C library, further reducing differences.

This fails on the current main, and switching to minimal libc does fix the issue

keith-packard commented 7 months ago

This fails on the current main, and switching to minimal libc does fix the issue

Awesome. Can you try with picolibc and disabling thread local storage, or using minimal libc and enabling thread local storage? That's really the big difference between the two sets of defaults -- minimal libc leaves thread local storage disabled by default while picolibc enables it. Thread local storage is used in lots of places in the core kernel now, so turning that on changes a bunch of code paths.

github-actions[bot] commented 5 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

github-actions[bot] commented 3 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

dleach02 commented 3 months ago

@keith-packard, are there instructions for enabling/disabling the thread local storage between the two libs?

This fails on the current main, and switching to minimal libc does fix the issue

Awesome. Can you try with picolibc and disabling thread local storage, or using minimal libc and enabling thread local storage? That's really the big difference between the two sets of defaults -- minimal libc leaves thread local storage disabled by default while picolibc enables it. Thread local storage is used in lots of places in the core kernel now, so turning that on changes a bunch of code paths.

keith-packard commented 3 months ago

You can disable CONFIG_THREAD_LOCAL_STORAGE as long as you aren't using C++ -- that will force Zephyr to build picolibc as a module instead of using the toolchain version. Other C libraries don't care as they don't have any TLS usage internally.

github-actions[bot] commented 1 month ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.