zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.8k stars 6.58k forks source link

missing SMP fixes for RISC-V #50728

Closed npitre closed 2 years ago

npitre commented 2 years ago

A few issues prevent RISC-V from working properly in an SMP configuration.

Fixes for those issues are provided in PR #50679.


riscv: pmp: fix stackguard when used on SMP

The IRQ stack in particular is different on each CPU, and so is its stack guard PMP entry value. This creates 2 issues:

Fix both issues by not remembering the actual address for the last global entry but a dummy address instead that is guaranteed not to match any opportunistic single-slot TOR mapping.


riscv: PMP-based stack guard is incompatible with stack sentinel

The software-based stack sentinel writes to the very bottom of the stack area triggering the PMP stack protection. Obviously they can't be used together.


riscv: fix crash resulting from touching the initial stack's guard area

The interrupt stack is used as the system stack during kernel initialization while IRQs are not yet enabled. The sp register is set to z_interrupt_stacks + CONFIG_ISR_STACK_SIZE.

CONFIG_ISR_STACK_SIZE only represents the desired usable stack size. This does not take into account the added guard area. Result is a stack whose pointer is much closer to the trigger zone than expected when CONFIG_PMP_STACK_GUARD=y, and the SMP configuration in particular pushes it over the edge during many CI test cases.

Worse: during early init we're not quite ready to handle exceptions yet and complete havoc ensues with no meaningful debugging output.

Make sure the early assembly code locates the actual top of the stack by generating a constant with its true size.


tests/semaphore: fix "cpu test took too long" assertion failure

The SMP config for RISC-V on QEMU triggers this:

START - test_sem_queue_mutual_exclusion
Assertion failed at
WEST_TOPDIR/zephyr/subsys/testsuite/ztest/src/ztest_new.c:155:
cpu_hold: (dt < 3000 is false)
1cpu test took too long (4090 ms)
ERROR: cannot fail in test 'after()', bailing

Looping 10000 times is maybe a bit excessive.


riscv: smp: update the qemu_riscv32/64 configs

No usermode nor stackguard CI tests are performed if CONFIG_RISCV_PMP is not set.

In turn, this requires a larger privileged stack on RV64 just like the non SMP case.


edersondisouza commented 2 years ago

Why not at least describe the issues here? Like, copy them from the PR, as this issue is somehow created from the PR. Why was this created at all?

npitre commented 2 years ago

Fair enough, will do.

carlocaione commented 2 years ago

Why was this created at all?

Because this is the formal procedure of when you want to merge an hotfix in the upcoming release quickly.