zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.49k stars 6.42k forks source link

SMP not working for ARMv8-R #76182

Open jgrivera67 opened 1 month ago

jgrivera67 commented 1 month ago

I'm trying to boot zephyr SMP on a non-cache coherent multi-core Cortex-R52 platform. Even with disabling the data cache for all cores, still zephyr crashes on the secondary cores. It turns out that this happens because zephyr only sets the VBAR (vector table base address) register for the primary core, but not for the secondary cores, which leaves the VBAR for the secondary cores with its reset value: 0. The problem is that if the zephyr image is loaded at an address other than 0x0, this will not work. I did an experiment setting the VBAR for the secondary cores to where the vector table is in my zephyr image and Zephyr booted fine on the secondary cores. So, is this an oversight in the zephyr port for ARMv8-R, or am I missing something?

Also, there are some other minor issues that I think should be fixed as part of correctly supporting SMP on ARMv8-R. Please see the patch file attached. ARMv8-R-SMP-bug-fixes-patch.txt

github-actions[bot] commented 1 month ago

Hi @jgrivera67! We appreciate you submitting your first issue for our open-source project. 🌟

Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. 🤖💙

wearyzen commented 1 month ago

Hi @jgrivera67, could you please confirm if you needed the attached "ARMv8-R-SMP-bug-fixes-patch.txt" or was setting the VBAR for secondary cores enough to fix the issue?

jgrivera67 commented 1 month ago

Hi @jgrivera67, could you please confirm if you needed the attached "ARMv8-R-SMP-bug-fixes-patch.txt" or was setting the VBAR for secondary cores enough to fix the issue?

Hi @wearyzen, the changes from the patch are also necessary. The changes in smp.c are required for correctness as follows:

arm_cpu_boot_params.fn = fn;
arm_cpu_boot_params.arg = arg;
arm_cpu_boot_params.cpu_num = cpu_num;

The change in cpu.h is necessary for multi-core platforms that have more than one cluster of cores, to ensure that each core in the system is unambiguously identified.

manuargue commented 1 month ago

@jgrivera67 did you test the patch with core caches enabled? Or there are still other issues when caches are enabled? Just asking because I remember something you mentioned about this on discord.

jgrivera67 commented 1 month ago

@jgrivera67 did you test the patch with core caches enabled? Or there are still other issues when caches are enabled? Just asking because I remember something you mentioned about this on discord.

Hi @manuargue , booting zephyr on a multi-core non-cache coherent Cortex-R52 platform does not work with caches enabled. The "cache invalidation" change I proposed is more for consistency than to solve the non-cache-coherent problem. Since there was already a "cache invalidate" before the loop, we should have a "cache invalidate" inside the loop as well, as having the "cache invalidate". Even better, this loop that read arm_cpu_boot_params.fn should use an "atomic load" instead (as shown below), and the code that updates it should use an "atomic store" . This way cache invalidation would not be necessary in the loop. Also, for non-cache coherent SMP support, we should have a special implementation of all atomic primitives that do cache invalidation in the ldaex/stlex loops.

arm_cpu_boot_params.fn = fn;
    arm_cpu_boot_params.arg = arg;
    arm_cpu_boot_params.cpu_num = cpu_num;
    barrier_dsync_fence_full(); // BUG FIX

    /* store mpid last as this is our synchronization point */
    arm_cpu_boot_params.mpid = cpu_mpid;

    barrier_dsync_fence_full();

    /*! TODO: Support PSCI
     *  \todo Support PSCI
     */

    /* Wait secondary cores up, see arch_secondary_cpu_init */
    while (atomic_load(&arm_cpu_boot_params.fn)) { <-------------------- DO ATOMIC LOAD here 
        wfe();
    }
wearyzen commented 1 month ago

@jgrivera67, thank you for your quick feedback. I understood the cache invalidate and affinity changes but I see there is already a call to barrier_dsync_fence_full() after the mpid is set in the below code so, the call before it is a bit confusing. As I understand since this patch is not actually tested could I assume that the extra call to barrier_dsync_fence_full() is not needed?

        arm_cpu_boot_params.arg = arg;
    arm_cpu_boot_params.cpu_num = cpu_num;
    barrier_dsync_fence_full(); // BUG FIX  ----> Is this needed since there is an Existing call just below

    /* store mpid last as this is our synchronization point */
    arm_cpu_boot_params.mpid = cpu_mpid;

    barrier_dsync_fence_full();     ----> // Existing call

Please correct me if I misunderstood or missed something and thank you again for the explanation and suggestions.

jgrivera67 commented 1 month ago

Hi @wearyzen, actually, I think the second (existing) barrier_dsync_fence_full() could be removed, because the one necessary for cross-core correctness is the first one (new one). Secondary cores execute the following polling loop when they come out of reset:

    dmb ld
    ldr r2, [r0, #BOOT_PARAM_MPID_OFFSET] <-------- (**) read arm_cpu_boot_params.mpid
    cmp r1, r2
    bne 1b

A secondary core break out of this loop when the primary core executes the following for the MPID of the secondary core:

/* store mpid last as this is our synchronization point */
    arm_cpu_boot_params.mpid = cpu_mpid;

So, we need to make sure that by the time that secondary core does the load at (**), the following assignments executed by the primary core are already committed to memory:

        arm_cpu_boot_params.fn = fn;
    arm_cpu_boot_params.arg = arg;
    arm_cpu_boot_params.cpu_num = cpu_num;

To ensure that we need to add the memory barrier proposed in the patch:

        arm_cpu_boot_params.fn = fn;
    arm_cpu_boot_params.arg = arg;
    arm_cpu_boot_params.cpu_num = cpu_num;
        barrier_dsync_fence_full(); // BUG FIX

       /* store mpid last as this is our synchronization point */
    arm_cpu_boot_params.mpid = cpu_mpid;    
wearyzen commented 1 month ago

Hi @jgrivera67,

I noticed that at smp.c#144, the cache is invalidated but not flushed after the primary core updates the arm_cpu_boot_params.mpid so, the non-cache coherent secondary cores might not get the updated value. Could you please try the below patch along with your previous changes to set VBAR to see if booting works with caches enabled?

diff --git a/arch/arm/core/cortex_a_r/smp.c b/arch/arm/core/cortex_a_r/smp.c
index 0214b4715f6..d01087122a2 100644
--- a/arch/arm/core/cortex_a_r/smp.c
+++ b/arch/arm/core/cortex_a_r/smp.c
@@ -137,11 +137,12 @@ void arch_cpu_start(int cpu_num, k_thread_stack_t *stack, int sz, arch_cpustart_
        arm_cpu_boot_params.arg = arg;
        arm_cpu_boot_params.cpu_num = cpu_num;

+       barrier_dsync_fence_full();
+
        /* store mpid last as this is our synchronization point */
        arm_cpu_boot_params.mpid = cpu_mpid;

-       barrier_dsync_fence_full();
-       sys_cache_data_invd_range(
+       sys_cache_data_flush_and_invd_range(
                        (void *)&arm_cpu_boot_params,
                        sizeof(arm_cpu_boot_params));
jgrivera67 commented 1 month ago

Hi @wearyzen, this change does not help because the data cache for Cortex-R52 is "write-through", so doing a "flush & invalidate" instead of just "invalidate" does not make a difference.

wearyzen commented 1 month ago

Hi @jgrivera67,

I am trying to find to find a setup to try this but it could take a while. Meanwhile could you please help me with below questions?

jgrivera67 commented 1 month ago

Hi @wearyzen,

  1. I'm disabling the data cache in function z_arm_platform_init which is invoked from EL1_Reset_Handler in arch/arm/core/cortex_a_r/reset.S:
    
    void z_arm_platform_init(void)
    {
    ...

if defined(CONFIG_SMP)

__set_SCTLR(__get_SCTLR() & ~SCTLR_C_Msk);

else

 ...

endif

}


2. I don't need to disable the data cache to get past the loop at [here](https://github.com/zephyrproject-rtos/zephyr/blob/c44abdf5b7932b080f31e90cebbec07a8e092068/arch/arm/core/cortex_a_r/reset.S#L217). However the other fixes from the patch I included are necessary. However, if I don't disable the data caches, the secondary core gets stuck in the idle thread, and the primary core executes an application thread once, and then it gets stuck also in the idle thread.

3. Below is the UART output when booting my dual-core cortex-R52 platform when data caches are enabled on both cores:

Booting Zephyr OS build v3.6.0-7683-g1f3e2e55af9c Secondary CPU core 1 (MPID:0x10001) is up thread_a: Hello World from cpu 0


The expected output should be:

Booting Zephyr OS build v3.6.0-7683-g1f3e2e55af9c Secondary CPU core 1 (MPID:0x10001) is up thread_a: Hello World from cpu 0 thread_b: Hello World from cpu 1 thread_a: Hello World from cpu 0 thread_b: Hello World from cpu 1 thread_a: Hello World from cpu 0 thread_b: Hello World from cpu 1 ...


The only way to get this test app to work as expected is by disabling data caches on both cores.
wearyzen commented 2 weeks ago

Hi @jgrivera67, just wanted to let you know that I was able to reproduce the scenario and am working on it. I'll let you know if I have any updates on it.

ithinuel commented 3 days ago

@jgrivera67 after further investigation, it appears that making the Zephyr Kernel work on incoherent-cache systems would require more work than we have bandwidth for.

Currently Zephyr requires all kernel object to be in cache-coherent (or non-cached) area. You may be able to achieve this with dts configuration (marking some kernel region as non-cacheable) but this would need to be specific to your platform.

manuargue commented 2 days ago

You may be able to achieve this with dts configuration (marking some kernel region as non-cacheable) but this would need to be specific to your platform.

Do you have any reference of which kernel regions must be marked as non-cacheable?