Critical sections do nothing in unprivileged code

rust-embedded / cortex-m

Low level access to Cortex-M processors

Apache License 2.0

826 stars 151 forks source link

Critical sections do nothing in unprivileged code #233

Open cbiffle opened 4 years ago

cbiffle commented 4 years ago

Currently, the cortex_m::interrupt::free mechanism, for executing a closure without the interference of interrupts, brackets the closure with cpsid and (conditionally, to support nesting) cpsie instructions.

These instructions do nothing in unprivileged mode. (You might expect them to trap; nope.) As a result, cortex_m::interrupt:free becomes a relatively expensive FnOnce::call in unprivileged code.

It's not clear whether cortex_m supports the privileged/unprivileged distinction (there aren't a lot of signs that it does, we're hitting a lot of issues) but this seems like a magnificent footgun.

Options for fixing this off the top of my head:

Use a Cargo feature to change the implementation of things that expect privileged code for use in unprivileged contexts. In this case, operations that are inherently privileged -- like all peripherals on the PPB -- would become contingent on the privileged feature and disappear in unprivileged code. Compile time failures are nice. This is my preferred approach, because my unprivileged code is separately compiled and memory isolated, but I bet it won't cover most peoples' use cases -- I suspect that most users of cortex_m that are doing any multitasking or priv/unpriv distinction are probably using one big blob of code without isolation, all linked against the same cargo features. (Also, because of how Cargo features are defined, it would be super easy to accidentally pick up the privileged feature from your dependency graph. Laaame.)
Detect at runtime the current privileged mode and panic if interrupt::free is used in unprivileged code. This is a less-good option because it introduces a runtime failure, but it sure is easy.
Detect at runtime the current privileged mode and call into hooks for entering and leaving a critical section. (There is no standard way of doing this from unpriv code; it depends on the OS. Our OS doesn't actually support global critical sections in unprivileged code for latency reasons, so we'd leave the hooks unimplemented if you went this way. I'm not even sure you could design a useful hook signature since cortex_m::interrupt::free is ambient authority that requires no context information.)
Declare that cortex_m is not to be used in code that uses unprivileged mode and hope that nobody ever does the wrong thing. (This is actually the conclusion we're heading toward; the APIs really do assume privilege. But it'd be nice to fix this.)

hug-dev commented 4 years ago

I guess you could also say that unprivileged code should not be allowed to create a critical section for safety/security reasons? That if a critical section is necessary to perform an operation, this specific operation could be wrapped in a function, put in privileged code and exposed to the unprivileged one as one of the SVC calls with defined inputs/outputs? Option 2 could be done but could return an error instead of panicking if the caller is unprivileged!

adamgreig commented 4 years ago

I don't think there's been very much thought expended on supporting unprivileged uses of this crate; I suspect most current users never enter unprivileged mode.

For your option 1, if you instead had an 'unprivileged' feature, you no longer have the problem of Cargo accidentally adding all the privileged operations, though it goes slightly against the "additive-only" goal of features. It could be a fairly low-cost way to ensure that users in your position (building separate binaries for the unprivileged code) at least get compile-time failures. It still doesn't help with the presumably common use case of single-executable mixed priv/unpriv code though.

Option 2 adds even more overhead to every interrupt::free call, which I'm not especially excited by. Option 3 also doesn't seem ideal for the reasons you gave.

For option 4, a slight alternative is to produce a 'cortex-m-unprivileged' crate which only exposes unpriv actions; cortex-m itself could then re-export those to save writing them out twice. What unprivileged uses of cortex-m do you have?

cbiffle commented 4 years ago

I guess you could also say that unprivileged code should not be allowed to create a critical section for safety/security reasons?

@hug-dev That's a system policy decision that's not in the purview of this crate. (Some systems I've worked on, particularly based on FreeRTOS with the MPU extensions, create a lot of unprivileged critical sections; others forbid it.)

For your option 1, if you instead had an 'unprivileged' feature, you no longer have the problem of Cargo accidentally adding all the privileged operations, though it goes slightly against the "additive-only" goal of features.

@adamgreig - Yeah, that would at least cause potential surprises to go in a safe direction for systems like mine where priv and unpriv are separately compiled. (Additive-only is a misfeature, I'm increasingly convinced.)

Option 2 / Option 3

It seems we agree here. :-)

For option 4, a slight alternative is to produce a 'cortex-m-unprivileged' crate

I'm not sure that would work, because we depend on cortex-m both directly and indirectly.

On the one hand, we use it...

To get the intrinsic-equivalent operations wfi and dmb.
To use cortex-m::asm::delay in one case (though it's, um, horribly wrong on the M7 and M33, btw, should file a separate bug)
For iprintln!

On the other hand, we also use stm32f4 and stm32h7, among other register-access crates. (Our drivers are unprivileged.) This means we can't just switch to an unpriv version without also introducing unpriv versions of those crates. (Which, tbh, I'm considering.)

I suppose another option would be to split off the parts of cortex-m that are useful in both the "operating system" cases and the "big Arduino" cases into a lower-level crate. So, like, the intrinsics, iprintln!, and the register block definitions, but not the static-reliant Peripheral::take() or the critical sections that shut off all interrupt priorities. Then the register access crates could depend on that, and the higher-level HAL-ish crates (which we don't use because they're even less suited to memory isolation) could depend on cortex-m.

adamgreig commented 4 years ago

(Additive-only is a misfeature, I'm increasingly convinced.)

I agree; I already use it for mutually exclusive options in stm32f4 etc. I don't think it's disastrous to create a negative feature but it's not clear it will totally solve this problem either.

On the other hand, we also use stm32f4 and stm32h7, among other register-access crates. (Our drivers are unprivileged.) This means we can't just switch to an unpriv version without also introducing unpriv versions of those crates. (Which, tbh, I'm considering.)

What would you need to change in stm32f4 to make it suitable for unpriv use, just use of a new cortex-m-unpriv or some other behaviour too? Off the top of my head I can't think of anything they do except the cortex_m::interrupt::free() call in Peripherals::take(). If there's something we can readily change in the stm32-rs crates to allow use in unpriv scenarios I'm happy to consider it, especially if your only alternative is a fork.

I suppose another option would be to split off the parts of cortex-m that are useful in both the "operating system" cases and the "big Arduino" cases into a lower-level crate.

Is this the same idea as cortex-m-unpriv except with a different name?

There has already been talk of splitting cortex-m into some constituent parts: cortex-m-pac would contain just the peripheral/register definitions (and be generated using svd2rust to create the same API as the actual PACs); cortex-m-hal would contain the higher-level methods to manipulate those registers; feasibly cortex-m-intrinsics could contain the intrinsics, and cortex-m would just re-export all of these. Perhaps in that setup we could also find space to separate priv/unpriv operations in a way that works for both separate and combined compilation.

cbiffle commented 4 years ago

What would you need to change in stm32f4 to make it suitable for unpriv use, just use of a new cortex-m-unpriv or some other behaviour too? Off the top of my head I can't think of anything they do except the cortex_m::interrupt::free() call in Peripherals::take().

I think that's it -- just the dep and the take.

(The take() calls in general assume that all users are compiled together. With separately compiled tasks linked together, each one has its own copy of the global being set by take() -- so the safety properties are destroyed. The operation should be in a higher-level crate.)

If there's something we can readily change in the stm32-rs crates to allow use in unpriv scenarios I'm happy to consider it, especially if your only alternative is a fork.

Thanks. I'll try to get a sense for the required scope. (svd2rust is also driving us toward considering forking, but that's a whole 'nother issue.)

Is this the same idea as cortex-m-unpriv except with a different name?

Y'know, it might be, now that you phrase it like that. :-)

Crate split sounds reasonable; I'd be happy to provide input based on our use case.

adamgreig commented 4 years ago

(The take() calls in general assume that all users are compiled together. With separately compiled tasks linked together, each one has its own copy of the global being set by take() -- so the safety properties are destroyed. The operation should be in a higher-level crate.)

This is an interesting point and similar to the problems we have with multi-core applications where a single static flag might or might not be present in each core's memory map and also might or might not be appropriate depending on whether those resources are shared or replicated per core. Our current "safe" take() really relies on singly linked applications, though it does seem like they are the large majority of (current) cortex-m use cases. How do you handle relocations and statically allocated memory and so forth without virtual memory? Do applications just use a linker script with a reserved section of flash and RAM?

On a more general point I tend to agree that the thing that provides for single access should be at a higher level (e.g. in a high-level HAL) and the underlying PACs should provide only unsafe register access. That's the general idea behind the nosync feature in stm32ral (another negative feature!). Changing that would be a fairly significant ecosystem change though; perhaps we could consider it alongside splitting up this crate.

cbiffle commented 4 years ago

This is an interesting point and similar to the problems we have with multi-core applications

Yeah, I just ran across #149. It is a related issue. The suggestions made on that bug are not heading in the right direction for my use case -- for example, requiring peripherals to be accessed in a high-level-API CriticalSection.

Our current "safe" take() really relies on singly linked applications, though it does seem like they are the large majority of (current) cortex-m use cases.

Indeed -- though that may also be a product of the crate being designed for such applications. Tock, for instance, is not using cortex-m. (Though they have a stringent policy about external crates that might prevent using cortex-m regardless of design.)

How do you handle relocations and statically allocated memory and so forth without virtual memory?

In our case, an image builder tool relocates programs into the right isolated areas of memory, accounting for MPU alignment restrictions etc. Tock is similar. We've looked into fully position-independent program images using ROPI/RWPI, where things like statics are accessed through a global data pointer held in a register, but the support in LLVM and rustc is just not there yet.

On a more general point I tend to agree that the thing that provides for single access should be at a higher level (e.g. in a high-level HAL) and the underlying PACs should provide only unsafe register access.

I would like to put together a proposal for which bits of cortex-m should be moved to a lower-level crate, with some particular notes on design issues we've encountered (e.g. which operations can be applied to a RegisterBlock vs. having to have the phantom peripheral token). This issue is probably not the right place; shall I file a new one, or is there already a discussion in progress?

cbiffle commented 4 years ago

FWIW, here are the safe operations that will definitely misbehave in unprivileged mode. There may be others, but these are the ones that jumped out. There are also unsafe operations that will misbehave, but these probably just need to be documented.

interrupt::disable() becomes a no-op.
interrupt::free() similarly becomes FnOnce::call.
register::basepri_mask::write also becomes a no-op. It's marked "safe" because it can only move the basepri in the "fewer interrupts enabled" direction, but it doesn't actually do that in unprivileged mode.

These are a result of ARM's (incorrect, imo) decision to make CPS and MSR silently do nothing when misused by unprivileged code on v7-M, and MRS return zero. (The right thing to do would have been to trap.)

I've squinted at every pub operation in the crate at this point and haven't seen any others (though there are a couple I thought I saw, but misinterpreted, like the briefly-opened bug #237).

adamgreig commented 4 years ago

This issue is probably not the right place; shall I file a new one, or is there already a discussion in progress?

Please go ahead with a new issue, I don't think we have one here to discuss the potential split yet.

leo60228 commented 2 years ago

I think this issue overlaps with the problems homogenous multicore systems run in to.