zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.77k stars 6.57k forks source link

s2ram: r5 ends up with lr address / wrong stacking in PR 68860 #77202

Open HesselM opened 2 months ago

HesselM commented 2 months ago

Describe the bug For the qn9090 (cortex M4, armv7) we are building S2RAM support.

We are testing with a hello-world main, and some shell commands which use pm_state_force to change the state to S2RAM in a handler on the system workq. We use this zephyr repo: https://github.com/Sendrato/zephyr/tree/main-branch-develop, which is based on zephyr 3.6

The assembly of the main-loop looks like: Screenshot 2024-08-16 at 15 36 34

So r4-r6 are loaded with the proper pointers before the loop starts looping infinitely. The register values on a cold-boot are correct: all of r4-r6 point to the correct addresses: Screenshot 2024-08-16 at 15 37 55 Screenshot 2024-08-16 at 15 39 08

After wakeup with S2RAM, the r5 value is changed. So somehow the stacking gets messed up. Screenshot 2024-08-16 at 15 41 17

With addr2line it turns out that r5 is now pointing to the instruction just after arch_pm_s2ram_resume. I've tried different setups for the code, but bo matter what changes and at which offset/address code is stored, r5 always points to the address after arch_pm_s2ram_resume upon warm boot.

$  /Users/hesselm/Development/toolchains/zephyr-sdk-0.16.3/arm-zephyr-eabi/bin/arm-zephyr-eabi-addr2line -e ./../build-dk6_qn9090/zephyr/zephyr.elf 0x5373
/Users/hesselm/Development/git/sendrato/awareable-fw-repo/zephyr/arch/arm/core/cortex_m/reset.S:120

Screenshot 2024-08-17 at 14 46 10

We are not using CONFIG_PLATFORM_SPECIFIC_INIT, so the first call is to the assembly to disable the MPU

Reverting PR #68860 seems to fix this issue: r5 is correct after resuming s2ram. In this PR the value of lr gets pushed on the stack. I'm not certain which stack, but my guess is that the SP or MSP are not properly setup yet and the original push of all registers in arch_pm_s2ram_suspend gets overwritten. When I try this simple (but not ARM-acceptable) patch on PR #68860, the values in r4-r6 in the main loop are correct:

    /*
     * Check if reset occurred after suspending to RAM.
     */
-   push    {lr}
+   mov r5, lr
    bl      pm_s2ram_mark_check_and_clear
    cmp r0, #0x1
-   pop     {lr}
+   mov     lr, r5
    beq resume
    bx  lr

So, my guess is that the pushing/popping of {r4-r12, lr} in the suspend and resume call assume a certain stack, which might be overridden by a warm boot. I think the fix might be in either:

I think the second approach is the cleanest, but I'm not entirely sure if the above deduction about the (incorrect) stacking is fully correct or makes sense. It also does not make sense to me that only r5 is affected and r4 and r6 are properly pushed/popped.

PR #76412 has the same issue (and has some other errors as well).

To Reproduce Not sure if this is easily reproduced, if needed i can publish the full test-app and setup we use.

  1. zephry repo: https://github.com/Sendrato/zephyr/tree/main-branch-develop
  2. rework hello-world print statement such that the assembly uses r4-r6 as address
  3. use shell to put SoC into S2RAM.
  4. wakeup SoC and observe the value of r5 when the main loop is scheduled.

Expected behavior r5 holds the value of lr, pointing to the next address after arch_pm_s2ram_resume

Impact For S2RAM: system might respond unpredictable when there is code using r5 without ldr.

Logs and console output See description

Environment (please complete the following information):

Additional context We have ported support for the qn9090 ourselves and tried to merge this into Zephyr, but NXP rejected it.

HesselM commented 2 months ago

I tried the patch below, but the issue still persists. Even more: the registers stored in the RAM-buffer are completely different then the register-values used in the main-thread. So pushing / popping lr does not interfere with the pushed/popped register values in the suspend and resume call.

Patch:

diff --git a/arch/arm/core/cortex_m/pm_s2ram.S b/arch/arm/core/cortex_m/pm_s2ram.S
index 1e5bca04fe2..5a5c8efdd97 100644
--- a/arch/arm/core/cortex_m/pm_s2ram.S
+++ b/arch/arm/core/cortex_m/pm_s2ram.S
@@ -26,9 +26,19 @@ SECTION_FUNC(TEXT, arch_pm_s2ram_suspend)
     *
     * r0: address of the system_off function
     */
-   push    {r4-r12, lr}
    ldr r1, =_cpu_context

+   str r4, [r1, #___cpu_context_t_r4_OFFSET]
+   str r5, [r1, #___cpu_context_t_r5_OFFSET]
+   str r6, [r1, #___cpu_context_t_r6_OFFSET]
+   str r7, [r1, #___cpu_context_t_r7_OFFSET]
+   str r8, [r1, #___cpu_context_t_r8_OFFSET]
+   str r9, [r1, #___cpu_context_t_r9_OFFSET]
+   str r10, [r1, #___cpu_context_t_r10_OFFSET]
+   str r11, [r1, #___cpu_context_t_r11_OFFSET]
+   str r12, [r1, #___cpu_context_t_r12_OFFSET]
+   str lr, [r1, #___cpu_context_t_lr_OFFSET]
+
    mrs r2, msp
    str r2, [r1, #___cpu_context_t_msp_OFFSET]

@@ -65,7 +75,9 @@ SECTION_FUNC(TEXT, arch_pm_s2ram_suspend)
    /*
     * Mark entering suspend to RAM.
     */
+   push    {r0}
    bl pm_s2ram_mark_set
+   pop {r0}

    /*
     * Call the system_off function passed as parameter. This should never
@@ -85,7 +97,17 @@ SECTION_FUNC(TEXT, arch_pm_s2ram_suspend)
    bl pm_s2ram_mark_check_and_clear
    pop {r0}

-   pop {r4-r12, lr}
+   ldr r1, =_cpu_context
+   ldr r4, [r1, #___cpu_context_t_r4_OFFSET]
+   ldr r5, [r1, #___cpu_context_t_r5_OFFSET]
+   ldr r6, [r1, #___cpu_context_t_r6_OFFSET]
+   ldr r7, [r1, #___cpu_context_t_r7_OFFSET]
+   ldr r8, [r1, #___cpu_context_t_r8_OFFSET]
+   ldr r9, [r1, #___cpu_context_t_r9_OFFSET]
+   ldr r10, [r1, #___cpu_context_t_r10_OFFSET]
+   ldr r11, [r1, #___cpu_context_t_r11_OFFSET]
+   ldr r12, [r1, #___cpu_context_t_r12_OFFSET]
+   ldr lr, [r1, #___cpu_context_t_lr_OFFSET]
    bx  lr

@@ -95,9 +117,11 @@ SECTION_FUNC(TEXT, arch_pm_s2ram_resume)
     * Check if reset occurred after suspending to RAM.
     */
    push    {lr}
+   //mov   r5, lr
    bl      pm_s2ram_mark_check_and_clear
    cmp r0, #0x1
    pop     {lr}
+   //mov     lr, r5
    beq resume
    bx  lr

@@ -139,9 +163,20 @@ resume:

    ldr r1, [r0, #___cpu_context_t_control_OFFSET]
    msr control, r1
+
+   ldr r4, [r0, #___cpu_context_t_r4_OFFSET]
+   ldr r5, [r0, #___cpu_context_t_r5_OFFSET]
+   ldr r6, [r0, #___cpu_context_t_r6_OFFSET]
+   ldr r7, [r0, #___cpu_context_t_r7_OFFSET]
+   ldr r8, [r0, #___cpu_context_t_r8_OFFSET]
+   ldr r9, [r0, #___cpu_context_t_r9_OFFSET]
+   ldr r10, [r0, #___cpu_context_t_r10_OFFSET]
+   ldr r11, [r0, #___cpu_context_t_r11_OFFSET]
+   ldr r12, [r0, #___cpu_context_t_r12_OFFSET]
+   ldr lr, [r0, #___cpu_context_t_lr_OFFSET]
+
    isb

-   pop {r4-r12, lr}

    /*
     * Set the return value and return
diff --git a/arch/arm/core/offsets/offsets_aarch32.c b/arch/arm/core/offsets/offsets_aarch32.c
index 4399377134d..31e1aecdd33 100644
--- a/arch/arm/core/offsets/offsets_aarch32.c
+++ b/arch/arm/core/offsets/offsets_aarch32.c
@@ -94,6 +94,18 @@ GEN_OFFSET_SYM(_cpu_context_t, primask);
 GEN_OFFSET_SYM(_cpu_context_t, faultmask);
 GEN_OFFSET_SYM(_cpu_context_t, basepri);
 GEN_OFFSET_SYM(_cpu_context_t, control);
+
+GEN_OFFSET_SYM(_cpu_context_t, r4);
+GEN_OFFSET_SYM(_cpu_context_t, r5);
+GEN_OFFSET_SYM(_cpu_context_t, r6);
+GEN_OFFSET_SYM(_cpu_context_t, r7);
+GEN_OFFSET_SYM(_cpu_context_t, r8);
+GEN_OFFSET_SYM(_cpu_context_t, r9);
+GEN_OFFSET_SYM(_cpu_context_t, r10);
+GEN_OFFSET_SYM(_cpu_context_t, r11);
+GEN_OFFSET_SYM(_cpu_context_t, r12);
+GEN_OFFSET_SYM(_cpu_context_t, lr);
+
 #endif /* CONFIG_PM_S2RAM */

 #endif /* _ARM_OFFSETS_INC_ */
diff --git a/include/zephyr/arch/arm/cortex_m/cpu.h b/include/zephyr/arch/arm/cortex_m/cpu.h
index 99255d20438..5b2dfbbe877 100644
--- a/include/zephyr/arch/arm/cortex_m/cpu.h
+++ b/include/zephyr/arch/arm/cortex_m/cpu.h
@@ -65,9 +65,16 @@ struct __cpu_context {
    uint32_t basepri;
    uint32_t control;

+   uint32_t r4;
+   uint32_t r5;
+   uint32_t r6;
+   uint32_t r7;
+   uint32_t r8;
+   uint32_t r9;
+   uint32_t r10;
+   uint32_t r11;
+   uint32_t r12;
+   uint32_t lr;
 };

 typedef struct __cpu_context _cpu_context_t;

Values of _cpu_context after wakeup from S2RAM: Screenshot 2024-08-18 at 11 48 11

Values of the registers in the main-loop BEFORE s2ram: Screenshot 2024-08-18 at 11 47 20

Values of registers in the main-loop AFTER awakeup from S2RAM: r5 is still wrong and holding the lr-value Screenshot 2024-08-18 at 11 48 00

HesselM commented 2 months ago

Upon closer inspection of the SP and MSP registers in arch_pm_s2ram_resume, i see that the SP (after wakeup from S2RAM) is pointing to and address in the main stack, so all stack-operations at wakeup overwrite / corrupt the main stack.

Used patch:

    /*
     * Check if reset occurred after suspending to RAM.
     */
+   ldr r0, =cpu_context_sp
+   str     sp, [r0]
+   ldr r0, =cpu_context_msp
+   mrs r1, psp
+       str     r1, [r0]

    push    {lr}
    bl      pm_s2ram_mark_check_and_clear
    cmp r0, #0x1
    pop     {lr}
    beq resume
    bx  lr

Content upon cold boot:

Content after boot from S2RAM wakeup:

Stack of main-thread in this test: 0x4003440 .. 0x400387F

HesselM commented 2 months ago

It turns out that the SP-issue a a bit more related to the SoC then to Zephyr:

To put the qn9090 in power-down (wake from ram), a ROM-api is used, which requires a dedicated stack:

static uint32_t rom_bootstack[0x100];
...
int POWER_EnterPowerDown(void)
{
    int rv = 0;

    LPC_LOWPOWER_T lp_config;

    /* get lp_config */
    POWER_GetPowerDownConfig(&_power_config, (void *) &lp_config);

    /* Stack for ROM API on wake-up / warm boot */
    uint32_t bootstack_addr = (uint32_t) &rom_bootstack[0x100 - 1u];
    BOOT_SetResumeStackPointer(bootstack_addr);
    SYSCON->CPSTACK = bootstack_addr;

    /* MCUX-power down */
    POWER_GoToPowerDown(&lp_config);

    return rv;
}

The stack for the ROM-api is set by BOOT_SetResumeStackPointer and the CPSTACK register. In the call POWER_GoToPowerDown the ROM-API changes the VTOR to some ROM-table which handles the wakeup.

After wakeup from RAM I can see the (presumed) SP-address 0x4003840 on the rom-stack. Screenshot 2024-08-19 at 13 47 07

Documentation is very limited on the ROM-API and boot-process, but I presume the ROM-api eventually resets the VTOR to the original value and triggers the reset-handler, after the SP has been assigned to this value.

In the original MCUX-SDK there is a small note on the bootup of warm-boot:

...
 if (warm_start)
        {
            if (SYSCON->CPSTACK)
            {
                /* if CPSTACK is not NULL, switch to CPSTACK value so we avoid to corrupt the stack used before power down
                     * Note: it looks like enough to switch to new SP now and not earlier */
                __asm volatile(
                        ".set   coproc_stack,   0x40000808\t\n"
                        "LDR    R1,=coproc_stack\t\n" // load co-processor stack pointer (from CPSTACK)
                        "LDR    R1,[R1]\t\n"
                        "MOV    SP,R1\t\n"
                        );
            }
...

Upon inspection of the CPSTACK register on wakeup-from-s2ram i can indeed see it is set. More specifically, the register is set to the address I assigned to it upon power-down: the address of &rom_bootstack[0x100 - 1u].

Adding the following assembly seems to do the trick:

    /*
     * Check if reset occurred after suspending to RAM.
     */

+   /*
+    * if CPSTACK is not NULL, switch to CPSTACK value so we avoid to corrupt
+    * the stack used before power down
+         */
+   ldr r1, =0x40000808
+   ldr     r0, [r1]
+   cmp     r0, #0
+   beq     keep_sp
+   mov     sp, r0
+
+ keep_sp:
    push    {lr}
    bl      pm_s2ram_mark_check_and_clear
    cmp r0, #0x1

While it works for the qn9090, I do wonder if this approach is 'generic' or wacky way of the MCUX-SDK and ROM-api implementation on the qn9090?

HesselM commented 2 months ago

After a bit more though: the issue with switching to the proper stack after s2ram wakeup is a bit more complicated and should not be solved in the arch_pm_s2ram_resume call.

Loading and using the CPSTACK register (0x40000808) from the qn9090 is specific to the qn9090. Also, branching to arch_pm_s2ram_resume from the reset-vector actually stores the LR on the stack too.

So t if a custom stack-pointer (for s2ram) needs to be loaded upon boot:

I can imagine this could be done with a Kconfig and specific assembly in the source-file of the soc. Something like:

zephyr/arch/arm/core/cortex_m/reset.S

/*
 * The entry point is located at the z_arm_reset symbol, which
 * is fetched by a XIP image playing the role of a bootloader, which jumps to
 * it, not through the reset vector mechanism. Such bootloaders might want to
 * search for a __start symbol instead, so create that alias here.
 */
SECTION_SUBSEC_FUNC(TEXT,_reset_section,__start)

+ #if defined(CONFIG_PM_S2RAM_BOOTSP)
+    /* Jump without stack-operation */
+    bx soc_pm_s2ram_bootsp
+ #endif /* CONFIG_S2RAM_BOOTSP */

#if defined(CONFIG_DEBUG_THREAD_INFO)
    /* Clear z_sys_post_kernel flag for RTOS aware debuggers */
    movs.n r0, #0
    ldr r1, =z_sys_post_kernel
    strb r0, [r1]
#endif /* CONFIG_DEBUG_THREAD_INFO */

zephyr/soc/ ..... / power.c

+ #ifdef CONFIG_PM_S2RAM_BOOTSP
+ /*
+  * if CPSTACK is not NULL, switch to CPSTACK value so we avoid to corrupt
+  * the stack used before power down
+  */
+ SECTION_FUNC(TEXT, soc_pm_s2ram_bootsp)
+   ldr r1, =0x40000808
+   ldr     r0, [r1]
+   cmp     r0, #0
+   beq     keep_sp
+   mov     sp, r0
+ keep_sp:
+   bx     lr
+ #endif /* CONFIG_PM_S2RAM_BOOTSP */
github-actions[bot] commented 1 day ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.