zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.5k stars 6.43k forks source link

optimize MPU context switch on ARM/ARC #15135

Open andrewboie opened 5 years ago

andrewboie commented 5 years ago

There is some overhead on context switch due to reprogramming memory protection hardware. We always will need to update privilege mode stack guards, but user stack regions and memory domains can be skipped in some scenarios. Implement lazy MPU/MMU reprogramming such that:

This should span multiple context switches. Consider user threads A and B in the same memory domain X, and thread C in memory domain Y. Let S represent some context switch activity between any number of arbitrary supervisor threads.

Optimistically I hope we can do this in some common code that calls into arch apis, rather than having to implement separately for each.

For implementation, we could perhaps add two (per-cpu?) fields to z_kernel:

struct k_mem_domain *last_domain;
k_thread_stack_t last_stack;

Upon context switching out, if the outgoing thread is in user mode, save the current memory domain and the thread's stack object pointer.

Upon context switching in, if the incoming thread is in user mode, then compare the incoming thread's memory domain and stack object with the value saved in the kernel, if they aren't the same take the respective actions to switch domains or update user stack region.

Any PRs will need profiling data to show that this actually improves performance.

ioannisg commented 5 years ago

I am not sure how critical this is for ARM, since context-switch executes on lowest priority, anyway. :) There's value here, of course.

andrewboie commented 5 years ago

I am not sure how critical this is for ARM, since context-switch executes on lowest priority, anyway. :) There's value here, of course.

@ioannisg in my opinion, context switch needs to be the most tightly optimized code in the system..

This is now implemented on x86, so scope of this ticket is for ARM/ARC

andrewboie commented 4 years ago

What worked best for x86 was to move all the logic for translating the memory domain configuration to the actual memory memory management hardware from context switch time, to when the thread gets added to a memory domain, or the memory domain is modified. Each thread has its own set of page tables, so a context switch is now just a cr3 register update.

For an MPU system, what I'd like to see is the MPU register set mirrored in a data structure within the thread struct.

zephyrbot commented 7 months ago

Hi @dcpleung,

This issue, marked as an Enhancement, was opened a while ago and did not get any traction. It was just assigned to you based on the labels. If you don't consider yourself the right person to address this issue, please re-assing it to the right person.

Please take a moment to review if the issue is still relevant to the project. If it is, please provide feedback and direction on how to move forward. If it is not, has already been addressed, is a duplicate, or is no longer relevant, please close it with a short comment explaining the reason.

@andrewboie you are also encouraged to help moving this issue forward by providing additional information and confirming this request/issue is still relevant to you.

Thanks!

dcpleung commented 7 months ago

Hm... this is highly implementation specific. So this will need to defer to maintainer of each arch.