jgrivera67 commented 4 months ago

I'm trying to boot zephyr SMP on a non-cache coherent multi-core Cortex-R52 platform. Even with disabling the data cache for all cores, still zephyr crashes on the secondary cores. It turns out that this happens because zephyr only sets the VBAR (vector table base address) register for the primary core, but not for the secondary cores, which leaves the VBAR for the secondary cores with its reset value: 0. The problem is that if the zephyr image is loaded at an address other than 0x0, this will not work. I did an experiment setting the VBAR for the secondary cores to where the vector table is in my zephyr image and Zephyr booted fine on the secondary cores. So, is this an oversight in the zephyr port for ARMv8-R, or am I missing something?

Also, there are some other minor issues that I think should be fixed as part of correctly supporting SMP on ARMv8-R. Please see the patch file attached. ARMv8-R-SMP-bug-fixes-patch.txt

github-actions[bot] commented 4 months ago

Hi @jgrivera67! We appreciate you submitting your first issue for our open-source project. 🌟

Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. 🤖💙

wearyzen commented 3 months ago

Hi @jgrivera67, could you please confirm if you needed the attached "ARMv8-R-SMP-bug-fixes-patch.txt" or was setting the VBAR for secondary cores enough to fix the issue?

jgrivera67 commented 3 months ago

Hi @jgrivera67, could you please confirm if you needed the attached "ARMv8-R-SMP-bug-fixes-patch.txt" or was setting the VBAR for secondary cores enough to fix the issue?

Hi @wearyzen, the changes from the patch are also necessary. The changes in smp.c are required for correctness as follows:

arm_cpu_boot_params.fn = fn;
arm_cpu_boot_params.arg = arg;
arm_cpu_boot_params.cpu_num = cpu_num;

barrier_dsync_fence_full(); // BUG FIX <--------- TO ENSURE THAT ASSIGNMENT BELOW IS SEEN IN THE RIGHT ORDER BY OTHER CORES

/ store mpid last as this is our synchronization point / arm_cpu_boot_params.mpid = cpu_mpid; @@ -152,6 +153,10 @@ void arch_cpu_start(int cpu_num, k_thread_stack_t stack, int sz, archcpustart / Wait secondary cores up, see arch_secondary_cpu_init */ while (arm_cpu_boot_params.fn) { wfe();
// BUG FIX: invalidate cache before reading arm_cpu_boot_params.fn <----- NEEDED FOR NON-CACHE-COHERENT PLATFORMS
sys_cache_data_invd_range(
(void *)&arm_cpu_boot_params,
sizeof(arm_cpu_boot_params)); }

The change in cpu.h is necessary for multi-core platforms that have more than one cluster of cores, to ensure that each core in the system is unambiguously identified.

manuargue commented 3 months ago

@jgrivera67 did you test the patch with core caches enabled? Or there are still other issues when caches are enabled? Just asking because I remember something you mentioned about this on discord.

jgrivera67 commented 3 months ago

@jgrivera67 did you test the patch with core caches enabled? Or there are still other issues when caches are enabled? Just asking because I remember something you mentioned about this on discord.

Hi @manuargue , booting zephyr on a multi-core non-cache coherent Cortex-R52 platform does not work with caches enabled. The "cache invalidation" change I proposed is more for consistency than to solve the non-cache-coherent problem. Since there was already a "cache invalidate" before the loop, we should have a "cache invalidate" inside the loop as well, as having the "cache invalidate". Even better, this loop that read arm_cpu_boot_params.fn should use an "atomic load" instead (as shown below), and the code that updates it should use an "atomic store" . This way cache invalidation would not be necessary in the loop. Also, for non-cache coherent SMP support, we should have a special implementation of all atomic primitives that do cache invalidation in the ldaex/stlex loops.

arm_cpu_boot_params.fn = fn;
    arm_cpu_boot_params.arg = arg;
    arm_cpu_boot_params.cpu_num = cpu_num;
    barrier_dsync_fence_full(); // BUG FIX

    /* store mpid last as this is our synchronization point */
    arm_cpu_boot_params.mpid = cpu_mpid;

    barrier_dsync_fence_full();

    /*! TODO: Support PSCI
     *  \todo Support PSCI
     */

    /* Wait secondary cores up, see arch_secondary_cpu_init */
    while (atomic_load(&arm_cpu_boot_params.fn)) { <-------------------- DO ATOMIC LOAD here 
        wfe();
    }

wearyzen commented 3 months ago

@jgrivera67, thank you for your quick feedback. I understood the cache invalidate and affinity changes but I see there is already a call to barrier_dsync_fence_full() after the mpid is set in the below code so, the call before it is a bit confusing. As I understand since this patch is not actually tested could I assume that the extra call to barrier_dsync_fence_full() is not needed?

        arm_cpu_boot_params.arg = arg;
    arm_cpu_boot_params.cpu_num = cpu_num;
    barrier_dsync_fence_full(); // BUG FIX  ----> Is this needed since there is an Existing call just below

    /* store mpid last as this is our synchronization point */
    arm_cpu_boot_params.mpid = cpu_mpid;

    barrier_dsync_fence_full();     ----> // Existing call

Please correct me if I misunderstood or missed something and thank you again for the explanation and suggestions.

jgrivera67 commented 3 months ago

Hi @wearyzen, actually, I think the second (existing) barrier_dsync_fence_full() could be removed, because the one necessary for cross-core correctness is the first one (new one). Secondary cores execute the following polling loop when they come out of reset:

    dmb ld
    ldr r2, [r0, #BOOT_PARAM_MPID_OFFSET] <-------- (**) read arm_cpu_boot_params.mpid
    cmp r1, r2
    bne 1b

A secondary core break out of this loop when the primary core executes the following for the MPID of the secondary core:

/* store mpid last as this is our synchronization point */
    arm_cpu_boot_params.mpid = cpu_mpid;

So, we need to make sure that by the time that secondary core does the load at (**), the following assignments executed by the primary core are already committed to memory:

        arm_cpu_boot_params.fn = fn;
    arm_cpu_boot_params.arg = arg;
    arm_cpu_boot_params.cpu_num = cpu_num;

To ensure that we need to add the memory barrier proposed in the patch:

        arm_cpu_boot_params.fn = fn;
    arm_cpu_boot_params.arg = arg;
    arm_cpu_boot_params.cpu_num = cpu_num;
        barrier_dsync_fence_full(); // BUG FIX

       /* store mpid last as this is our synchronization point */
    arm_cpu_boot_params.mpid = cpu_mpid;

wearyzen commented 3 months ago

Hi @jgrivera67,

I noticed that at smp.c#144, the cache is invalidated but not flushed after the primary core updates the arm_cpu_boot_params.mpid so, the non-cache coherent secondary cores might not get the updated value. Could you please try the below patch along with your previous changes to set VBAR to see if booting works with caches enabled?

diff --git a/arch/arm/core/cortex_a_r/smp.c b/arch/arm/core/cortex_a_r/smp.c
index 0214b4715f6..d01087122a2 100644
--- a/arch/arm/core/cortex_a_r/smp.c
+++ b/arch/arm/core/cortex_a_r/smp.c
@@ -137,11 +137,12 @@ void arch_cpu_start(int cpu_num, k_thread_stack_t *stack, int sz, arch_cpustart_
        arm_cpu_boot_params.arg = arg;
        arm_cpu_boot_params.cpu_num = cpu_num;

+       barrier_dsync_fence_full();
+
        /* store mpid last as this is our synchronization point */
        arm_cpu_boot_params.mpid = cpu_mpid;

-       barrier_dsync_fence_full();
-       sys_cache_data_invd_range(
+       sys_cache_data_flush_and_invd_range(
                        (void *)&arm_cpu_boot_params,
                        sizeof(arm_cpu_boot_params));

jgrivera67 commented 3 months ago

Hi @wearyzen, this change does not help because the data cache for Cortex-R52 is "write-through", so doing a "flush & invalidate" instead of just "invalidate" does not make a difference.

wearyzen commented 3 months ago

Hi @jgrivera67,

I am trying to find to find a setup to try this but it could take a while. Meanwhile could you please help me with below questions?

How are you disabling the caches? is it in z_arm_reset of reset.S?
From your debugging, do you already know if the secondary cores are stuck in the loop here waiting for primary core to set the mpidr? And if that is the case, have you tried disabling caches before this loop and enabling it after?
Also, would you be able to share the bootup/crash logs?

jgrivera67 commented 3 months ago

Hi @wearyzen,

I'm disabling the data cache in function z_arm_platform_init which is invoked from EL1_Reset_Handler in arch/arm/core/cortex_a_r/reset.S:
```
void z_arm_platform_init(void)
{
...
```

if defined(CONFIG_SMP)

__set_SCTLR(__get_SCTLR() & ~SCTLR_C_Msk);

else

...

endif

}


2. I don't need to disable the data cache to get past the loop at [here](https://github.com/zephyrproject-rtos/zephyr/blob/c44abdf5b7932b080f31e90cebbec07a8e092068/arch/arm/core/cortex_a_r/reset.S#L217). However the other fixes from the patch I included are necessary. However, if I don't disable the data caches, the secondary core gets stuck in the idle thread, and the primary core executes an application thread once, and then it gets stuck also in the idle thread.

3. Below is the UART output when booting my dual-core cortex-R52 platform when data caches are enabled on both cores:

Booting Zephyr OS build v3.6.0-7683-g1f3e2e55af9c Secondary CPU core 1 (MPID:0x10001) is up thread_a: Hello World from cpu 0


The expected output should be:

Booting Zephyr OS build v3.6.0-7683-g1f3e2e55af9c Secondary CPU core 1 (MPID:0x10001) is up thread_a: Hello World from cpu 0 thread_b: Hello World from cpu 1 thread_a: Hello World from cpu 0 thread_b: Hello World from cpu 1 thread_a: Hello World from cpu 0 thread_b: Hello World from cpu 1 ...


The only way to get this test app to work as expected is by disabling data caches on both cores.

wearyzen commented 3 months ago

Hi @jgrivera67, just wanted to let you know that I was able to reproduce the scenario and am working on it. I'll let you know if I have any updates on it.

ithinuel commented 2 months ago

@jgrivera67 after further investigation, it appears that making the Zephyr Kernel work on incoherent-cache systems would require more work than we have bandwidth for.

Currently Zephyr requires all kernel object to be in cache-coherent (or non-cached) area. You may be able to achieve this with dts configuration (marking some kernel region as non-cacheable) but this would need to be specific to your platform.

manuargue commented 2 months ago

You may be able to achieve this with dts configuration (marking some kernel region as non-cacheable) but this would need to be specific to your platform.

hi @ithinuel, could you please share which kernel regions must be marked as non-cacheable? Thanks

ithinuel commented 2 months ago

Hi @manuargue, As far as I understand it is not a specific region. It is any kernel objects (so possibly dynamically allocated too?).

I am not very familiar with Cortex-R52 and don't know your system, so what follows is speculative:

You should be able to enable the MPU background region and its 0x6000_0000-0x7FFF_FFFF space as a Normal, Shareable, Non-cacheable space for zephyr's memory. Then define a specific region that you'd like to be cached.

See the Cortex-R52's TRM

github-actions[bot] commented 4 days ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

zephyrproject-rtos / zephyr

SMP not working for ARMv8-R #76182

if defined(CONFIG_SMP)

else

endif