Open teburd opened 2 weeks ago
I still argue that eliminating PendSV on the context switch path entirely would be even better. Other architectures don't work that way, it's specific to cortex-m.
Maybe some evidence to support this idea here https://gvpress.com/journals/IJSH/vol9_no2/10.pdf
I still argue that eliminating PendSV on the context switch path entirely would be even better. Other architectures don't work that way, it's specific to cortex-m.
I'm new to this subject, but isn't the benefit of PendSV is that it allows having minimal context switching code for all Cortex-M CPUs? Also, the fact that it is asynchronous allows avoiding context switching in the middle of ISR & thus arguably improves ISR handling.
https://developer.arm.com/documentation/107706/0100/System-exceptions/Pended-SVC---PendSV
@JarmouniA it does reduce code size by unifying the "context switch on interrupt exit" and "synchronous/cooperative context switch" cases (by effectively making the latter a trap to an interrupt). But:
[1] OK, it does need to be mentiond that cortex-m specifically has very light weight interrupts, something other architectures are much weaker at. But even this falls down when you start adding more stuff: nested interrupts, MPU/MMU/FPU state handling, stack switching, etc... pollute that pretty badly when you start turning features on.
It’s also potentially faster given the paper I linked. Seems like a compounding set of reasons to try and make a smaller swap with inline asm and no pendsv, avoiding many quirks. Who’s gonna try it?
Is your enhancement proposal related to a problem? Please describe. Some benchmarks shows Zephyr behind in context swap performance compared to ThreadX.
Describe the solution you'd like Avoid any branch-link (function call) operations in PendSV handling for Arm, likely other archs could have the same idea implemented.
Every bl op is a potential pipeline flush, certainly some lost context, we almost immediately call out to a C function handler for PendSV handling (used for Arm context swap). There's several other bl ops involved depending on which options are involved.
ThreadX avoids almost all bl ops except a hook for swap in/swap out that is opt in. Otherwise has ~80 asm instructions for PendSV handling. Clearly has some performance implications somewhere here, maybe partially due to the branch out of inline asm. Perhaps other things, needs investigating.
https://github.com/eclipse-threadx/threadx/blob/master/ports_arch/ARMv7-M/threadx/gnu/src/tx_thread_schedule.S#L131
https://github.com/zephyrproject-rtos/zephyr/blob/main/arch/arm/core/cortex_m/swap_helper.S#L56
Describe alternatives you've considered Not doing anything
Additional context
Benchmark report showing difference in context swap performance, on a cortex-m4 https://www.dropbox.com/scl/fi/opimwfbvkd9coeprc7d5h/Beningo_RtosPerformance_2024_Report.pdf?rlkey=s3n007s6hgubnj37ovto88bs2&e=3&dl=0
In large part the difference is due to our MPU usage for hw stack protection by default, but this isn't the only thing playing a part.