linker: implement MPU alignment constraints

nashif commented 7 years ago

Reported by Andrew Boie:

This task is for devices with an MPU that requires memory regions to be power-of-two sized, and aligned to their size.

We have some discrete memory areas:

ROM (on flash in XIP devices) Program text rodata
Kernel-owned RAM region Data bss ** noinit including all thread stacks
Application owned RAM region (currently singleton, future may have multiple) Data bss ** noinit

For ROM, what we currently do at boot is configure the entire thing read-only and executable. This seems a reasonable policy. Any thread that wants to write to flash would need to be a supervisor.

For RAM, it gets a bit more complicated. The size of the kernel and app RAM regions is not known at build time, when the build completes we know we have X bytes of kernel RAM and Y bytes of application RAM. In addition, inside the kernel RAM noinit section contain all the thread stacks.

We would like to be able to implement the following policy at boot using the MPU:

Kernel RAM is not read/writable by user threads
Upon context switch, the stack of the incoming thread is set to user read/writable. The stack of the outgoing thread goes back to supervisor access only.
Application RAM is read/writable by user threads.

To align the stacks properly, something like this ought to work:

#define _ARCH_THREAD_STACK_DEFINE(sym, size) \
     char _GENERIC_SECTION(.stacks) __aligned(ROUND_POW2(size)) sym[ROUND_POW2(size)]

ROUND_POW2 could be something like https://stackoverflow.com/questions/22925016/rounding-up-to-powers-of-2-with-preprocessor-constants

The problem I am seeing is to have proper regions for the kernel and application. We need each to have their own MPU region. It's not completely clear to me how to set this up, particularly the alignment of the application RAM which comes after the kernel RAM.

(Imported from Jira ZEP-2304)

nashif commented 7 years ago

by Andrew Boie:

Hi All,

I think we may need to reconsider default policy for threads. What we had previously agreed on:

1) User threads can access their own stack 2) User threads can read/write globals defined in application or third-party library space by default. 3) No access to kernel RAM by default 4) If a user thread wants to define a memory region that only it or a subset of all threads can read/write, this memory would be defined in kernel space (__kernel macro or something) and access granted via APIs before the thread drops to user mode.

Getting the linker to split this between app and kernel wasn't terribly hard and that's what my patch series does. Getting this to work with an MMU is easy, just make sure each memory region is aligned to 4K.

Getting this to work well with an MPU that requires regions to be power of 2 sized and aligned to their size may be very, very hard though. I realized this when creating GH-2139.

If we have the RAM split between kernel and app regions, I'm not seeing a good way to ensure that each of the kernel and app memory areas will fall onto a properly aligned power-of-two regions. We don't know the sizes of these areas until the final link. Even if we manage to figure that out, we could have huge gaps in between the kernel and app memory areas. I'm not sure what to do about this.

An alternative is what FreeRTOS-MPU currently does: for any given thread, all memory is kernel memory except the stack and up to 4 regions that has been specifically granted by APIs. This won't require any sort of split, you would just have one big region of RAM that user threads can't read or write, and then on context switch enable the stack and whatever other regions were configured for that thread.

The disadvantage of the FreeRTOS approach is that in any given C file in an application, you'll fault if you access any of your toplevel globals unless specifically configured. An annoyance, especially if working with a large legacy codebase, or third-party libraries that aren't reentrant. I was hoping we could do something, such that we could turn thread protection on and almost all of our test cases / sample applications would continue to work, but this may not be feasible.

Any ideas on this greatly appreciated.

Andrew

nashif commented 7 years ago

by Andrew Boie:

The consensus we reached on the call is that MPU-based devices with the power-of-two size/alignment constraints can do one of two things:

1) the offset into application memory may need to be managed by hand. we should introduce a Kconfig variable so that the beginning address of application memory can be specified by the user. 2) Disable CONFIG_APPLICATION_MEMORY. All globals default to being owned by the kernel, threads default to only being able to read/write their own stacks. FreeRTOS-MPU policy.

nashif commented 7 years ago

by Mark Linkmeyer:

Andy Gross , is this story planned to be implemented in time for 1.9? If so, will you please change its status from New to "To Do"? This will indicate it's not in planning anymore and it's believed (with high confidence) to be feasible to get done in 1.9. Thx.

nashif commented 7 years ago

by Andy Gross:

Transitioned this to in progress. should have something shortly.

nashif commented 7 years ago

by Andy Gross:

Note: For SoCs using the default ARM MPU, you can get selectively enable subregions for any region size over 128 bytes. There are 8 equal size subregions per region.

For NXP, this is a non issue, as they use start/stop addresses (modulo 32 byte).

zephyriot / zep-jira14

linker: implement MPU alignment constraints #2139