zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.63k stars 6.51k forks source link

Zephyr doesn't boot on STM32F4 (and probably other ARMv7-M cores) #71698

Closed duda-patryk closed 5 months ago

duda-patryk commented 5 months ago

Describe the bug Zephyr can't boot on STM32F4 after #65071.

To Reproduce Build Zephyr and run on STM32F4

Expected behavior It boots

Impact Very high. Zephyr doesn't work.

Logs and console output No logs are printed.

Environment (please complete the following information): N/A

Additional context Reverting https://github.com/zephyrproject-rtos/zephyr/commit/f11027df8061932ddf1209f985a67164d140f303 seems to fix the problem.

I compared assembly code and looks like commit above removed cpsie if instruction from arch_switch_to_main_thread(). For me, the following change fixes problem (ARMv7-M only!):

diff --git a/arch/arm/core/cortex_m/thread.c b/arch/arm/core/cortex_m/thread.c
index f9a030675b5..87900df9091 100644
--- a/arch/arm/core/cortex_m/thread.c
+++ b/arch/arm/core/cortex_m/thread.c
@@ -576,6 +576,7 @@ void arch_switch_to_main_thread(struct k_thread *main_thread, char *stack_ptr,
        "mov   r4,  %0\n"       /* force _main to be stored in a register */
        "msr   PSP, %1\n"       /* __set_PSP(stack_ptr) */

+       "cpsie if\n"            /* __enable_irq(); __enable_fault_irq() */
        "mov   r0,  #0\n"       /* arch_irq_unlock(0) */
        "ldr   r3, =arch_irq_unlock_outlined\n"
        "blx   r3\n
nordicjm commented 5 months ago

Did you try with https://github.com/zephyrproject-rtos/zephyr/pull/71593 ?

duda-patryk commented 5 months ago

Did you try with #71593 ?

Yes

I realized that it fails to boot on warm reboots (cold reboots are fine).

ithinuel commented 5 months ago

It appears that I have missed something important around the (seemingly inconsistent) use of BASEPRI & cpsie/d… 🤔

Interrestingly, this isn’t caught by any of the tests I ran. I would like to fix the issue but also eventually add some tests to capture future regressions. Could you please detail the process you’re following to reproduce the issue? Do you use an in-tree example/test? How do you load/reload/reset?

soburi commented 5 months ago

I also encountered a similar phenomenon with arduino_uno_r4_minima (Renesas RA4M1, Cortex-M4/ARMv7-M). Similar to this issue, I confirmed that reverting https://github.com/zephyrproject-rtos/zephyr/commit/f11027df8061932ddf1209f985a67164d140f303 works correctly. It is also working fine with @duda-patryk patch.

ithinuel commented 5 months ago

Thank you for letting me know. I am currently running tests on a nucleo_f401re but can’t seem to be able to reproduce the issue.

Along with the fix, I’d like to understand & document the why is this extra cpsie required.

duda-patryk commented 5 months ago

Have you tried jumping to reset vector with interrupts disabled?

ithinuel commented 5 months ago

I am running Zephyr’s tests & samples. I get some false positives (ie failing test because of twister missing the start of the log) but they match before & after the changes to arch/arm.

Would you be able to provide a sample project that exhibits this issue?

duda-patryk commented 5 months ago

In my case, I'm jumping to Zephyr reset vector from another code (chain-loading) with interrupts disabled (cpsid i).

To reproduce issue without chain-loading you can use following code:

void z_arm_reset(void);

void test(void)
{
        __disable_irq();
        z_arm_reset();
}
ithinuel commented 5 months ago

So I edited samples/hello_world/src/main.c such that it becomes:

#include "zephyr/kernel.h"
#include <stdio.h>

extern void z_arm_reset(void);

int main(void) {
    printf("Hello World! %s\n", CONFIG_BOARD_TARGET);

    k_sleep(Z_TIMEOUT_MS(2000));

    __disable_irq();
    z_arm_reset();

    /* should not be reached */
    printf("Reboot failed\n");
    k_sleep(Z_FOREVER);

    return 0;
}

And this indeeds behaves incorrectly as the sleep is not stopping the thread for the requested amount of time.

However, z_arm_reset is not a proper way to reset the system, sys_reboot(_) from zephyr/sys/reboot.h is. Calling sys_reboot(SYS_REBOOT_WARM) or sys_reboot(SYS_REBOOT_COLD) does work as expected.

With reguards to chain loading, I also tried MCUBoot with the helloworld (using sys_reboot(SYS_REBOOT_WARM) and sys_reboot(SYS_REBOOT_COLD)) and it all seems to work as expected too.

For the record, here’s how I built & flashed them:

west build --sysbuild -p -b nucleo_f401re samples/hello_world -DSB_BOOTLOADER=MCUBOOT=y
pyocd load build/mcuboot/zephyr/zephyr.elf
pyocd load build/hello_world/zephyr/zephyr.signed.hex

It might be worth noting that the documentation for BOOTLOADER_MCUBOOT states that:

 By default, this option instructs Zephyr to initialize the core architecture HW registers during boot, when this is supported by the application. This removes the need by MCUboot to reset the core registers' state itself.

Could this be something useful for your usecase?

duda-patryk commented 5 months ago

MCUBoot works because they use irq_lock() to disable interrupts (faults are enabled) https://github.com/mcu-tools/mcuboot/blob/d2e69bf720ec4e5825f1bd7be441a9e3dedb5533/boot/zephyr/main.c#L205

@yperess do you remember why we disable interrupts using cpsid i?

ithinuel commented 5 months ago

CONFIG_MCUBOOT_CLEANUP_ARM_CORE defaults to y.

When the cleanup is enabled, do_boot() calls cleanup_arm_nvic() which disables interrupts too 🤔.

duda-patryk commented 5 months ago

When the cleanup is enabled, do_boot() calls cleanup_arm_nvic() which disables interrupts too

__disable_irq() is called indeed. @ithinuel I assume that it works for you :sweat_smile:

ithinuel commented 5 months ago

Well, I cannot reproduce the issue but from the bug report what's missing is enabling the IRQ so if anything it only confuses me more.

ithinuel commented 5 months ago

Are you happy closing this issue with the fix that was merged?

aescolar commented 5 months ago

Closing