Should `CriticalSection` really be `Clone` ?

rust-embedded / bare-metal

Abstractions common to microcontrollers

Apache License 2.0

116 stars 17 forks source link

Should `CriticalSection` really be `Clone` ? #42

Open kellda opened 3 years ago

kellda commented 3 years ago

I personally think CriticalSection should not be Clone, because of two use case requiring that:

interrupt::enable_cs: Safely enable interrupts, consuming a critical section. This can't work if there can be another copy lying around.
Mutex::borrow_mut: I'm quite sure this would allow to mutably borrow data in a mutex, by taking a &mut CriticalSection. This would only allow to mutably borrow one mutex as once, but I think it is still better than Mutex<RefCell<_>>.

All actual use cases ought to work by passing a &CriticalSection, although I don't think this needs to be used often. Should the size of &CriticalSection be a problem, one could imagine a SharedCriticalSection ZST borrowing a CriticalSection.

chrysn commented 2 years ago

CriticalSection not being Clone alone would not be enough for either of these, it'd take additional mechanisms that forbid interrupt::free() to nest, and that breaks other cases where nesting of interrupt free operation is OK (which is any code path where intermediate functions are outside of the control of the application doing the free, so no CS can be passed along easily).

Such a only-once-per-system CriticalSection would have different semantics from the current one; there may be a use case, but it's probably better called something different. (UniqueCriticalSection maybe?).

cr1901 commented 2 years ago

All actual use cases ought to work by passing a &CriticalSection.

If we pass a &CriticalSection to enable_cs, it becomes possible to enable interrupts within interrupt::free, which breaks the invariants of interrupt::free.

@chrysn

CriticalSection not being Clone alone would not be enough for either of these, it'd take additional mechanisms that forbid interrupt::free() to nest

I can't visualize how interrupt::enable_cs causes interrupt::free to nest, could you elaborate/provide an example? What additional mechanisms are needed to make enable_cs safe? AFAIK, you can't safely enable interrupts within interrupt::free in the current MSP430 API.

Ironically, enable_cs was introduced so we could create completely safe msp430 applications- there needs to be a controlled manner in which to enable interrupts without a CriticalSection existing. But I didn't realize CriticalSection was Clone. So @kellda is right that enable_cs is unsound.

We don't want a second CriticalSection token type to pass to interrupt::free because it makes the API unergonomic.

One workaround I can think of is to use a UniqueCriticalSection token/ZST that doesn't implement Clone as the argument to main, and enable_cs consumes this specific token. CriticalSection references bound to UniqueCriticalSection's lifetime can be created to access peripherals. That way, enable_cs can only safely be called as part of initialization code.

chrysn commented 2 years ago

If we pass a &CriticalSection to enable_cs, it becomes possible to enable interrupts within interrupt::free, which breaks the invariants of interrupt::free.

I understood kelida to mean that a &CriticalSection would be passed to all other actual use cases, whereas enable_cs would need an unclonable (unique) CS. While that does end the free() section prematurely, it also consumes the unique CS and thus makes sure that no more code in the free section actually depends on the property that was just lost.

(I still think it's just a bad idea, though -- if you're in a position to consume the unclonable CS token, you could return from your free section just as well).

can't visualize how interrupt::enable_cs causes interrupt::free to nest

I didn't mean that enable_cs causes interrupt:free to nest, I meant that even if the existing CriticalSection werer unclonable, one could still do

interrupt::free(|cs1| {
    interrupt::free(|cs2| {
        enable_cs(cs2)
    })
})

By nesting free, one can obtain two tokens -- and that's all good and correct, because while any of them is alive, interrupts are off. My point is that it's not just the Clone implementation on CriticalSection, it's the whole design of CriticalSection that makes it unsuitable for the two use cases @kelida mentioned.

I thus think that if there is a way forward with these use cases, it's through a separate UniqueCriticalSection (that might be Into<CriticalSection>). That would, on first sight, enable the use cases.

But is it really useful? A constructor for UniqueCriticalSection would need to check whether interrupts might already be disabled, and if so not issue the token. That means either panicking, or giving an Option<UniqueCriticalSection>. Neither is really practical.

My impression is that an interrupt-enabling function is generally unsafe (as long as anyone wants to depend on being inside an interrupt-free section). I have yet to see a case where enabling interrupts is actually a practical thing. (Not so incidentally, I recently made a PR for a big fat red warning in the IRQ enable routine of RIOT-OS -- if interrupt enabling were generally allowed, critical sections could never work with any kind of callback code any more).

cr1901 commented 2 years ago

@chrysn

My impression is that an interrupt-enabling function is generally unsafe (as long as anyone wants to depend on being inside an interrupt-free section)

I don't actually have a problem with this, personally. But I want to show that it's possible to write msp430 applications with the [deny(unsafe)] lint enabled, b/c there is still (and probably always will be) a notion that unsafe == bad. In addition, for a while, there was a push in general to get Rust embedded ecosystem (Cortex-M specifically) to so you can write applications without safe. With that in mind...

But is it really useful?

I want to provide safe abstractions to facilitate the use case of completely safe msp430 code. enable_cs was meant to be this abstraction for enabling interrupts from the main thread after you were done initializing peripherals. Msp430 starts with interrupts disabled, and unlike Cortex-M, disables interrupts when one occurs. On msp430, the entry point and interrupts get a "free" CriticalSection. Msp430 has small code size, and skipping the check for whether interrupts were disabled with interrupt::free does make a difference.

In all the code I've written, I used enable_cs only in the main thread to initially enable interrupts. _At that point, it felt natural that enable_cs could be used inside interrupts as well to "safely" enable interrupts. I can probably utilize a non-clonable token type which can be used to get &CriticalSection instances tied to my new token's lifetime to work around the Clone problem.

(that might be Into)

Well, that could work too, instead of passing around &CriticalSections. But once I consume UniqueCriticalSection, I've lost the ability to enable interrupts safely compared to "passing &CriticalSection instances around tied to UniqueCriticalSection's lifetime". In either case, enable_cs will need to be deprecated and either converted to unsafe or flat-out removed.

A constructor for UniqueCriticalSection would need to check whether interrupts might already be disabled, and if so not issue the token.

I wouldn't expose the constructor to user code. I would only provide UniqueCriticalSection through the interrupt and main function signatures. The unsafe code would be limited to the proc macro inside a library that only constructs UniqueCriticalSections at places the library knows interrupts are disabled (calling main, when an interrupt has occurred), bypassing the need for a check. The application would still be all safe code.

chrysn commented 2 years ago

I won't continue right now in looking at UniqueCriticalSection because it appears right now to be a whack-a-mole of edge cases, but to sort out the use cases, let me suggest something that may remove the need:

I want to provide safe abstractions to facilitate the use case of completely safe msp430 code.

I'm unfamiliar with the workings of MSP430; with the Cortex MCUs I've had my hands on it was always an option to just enable interrupts right away (eg. in the safe wrapper for the main function), because all individual interrupts are off already (and whatever enables them will need to make sure that whatever the ISR accesses is actually initialized).

If that's not an option on the MSP430, maybe it's practical to have a safe-only init function (inside which IRQs are all off, so even if code does call interrupt::free it's a practical no-op as it restores to an IRQ-off state), then unsafely enabling the interrupts in the caller of the safe init function (part of the -rt if that distinction exists there), and then calling a main.

Would that be an option for writing safe-only applications? (Otherwise we can still continue sketching UniqueCriticalSection...)

-- To use raw power is to make yourself infinitely vulnerable to greater powers. -- Bene Gesserit axiom

cr1901 commented 2 years ago

because all individual interrupts are off already

Ahhh, I guess because of NVIC, all peripherals already have to have an interrupt enable bit. There is no guarantee in MSP430-land that individual peripherals will have an interrupt enable bit that's separate from the global interrupt enable (GIE) in the status register. The USB peripheral is one such example.

maybe it's practical to have a safe-only init function ... Would that be an option for writing safe-only applications?

Yes, this is a possible option, and with LTO it should be no different in code size. Although I would like to avoid having to have an init attribute/proc macro in addition to main, interrupt, and preinit attributes for two reasons:
1. I want to keep the API close to what cortex-m (and C, for that matter) does, with size optimizations specific to msp430. Although I guess Arduino has a concept of an init function, so it's not that far out there.
2. Although preinit is soft-deprecated and shouldn't be used, I don't think it's going to be clear to ppl new to embedded what the difference is/why two different init functions need to exist. msp430 doesn't have a large amount of users compared to cortex-m; I could probably safely delete preinit from the API without breaking ppl's applications. Two minor (sic) version bumps before 1.0 won't be a huge burden if I have to add it back in.
On the other hand: UniqueCriticalSection or similar in main makes it clear that "the beginning of main is a good place to do your init without the penalty of interrupt::free". I can still pass CriticalSections (or &CriticalSection, if I so choose) directly to the interrupts and its sound as long as cloned tokens aren't used after enable_cs is called. Having two different token types isn't exactly ideal for ergonomics either.
Of course my solution is "unsafe != bad", and just eat the unsafe interrupt enable inside main and "just" forget the CriticalSection tokens afterward. But this isn't about what I want :P.

This was more of a minddump while it's on my mind, sorry :D!

YuhanLiin commented 2 years ago

If we pass a &CriticalSection to enable_cs, it becomes possible to enable interrupts within interrupt::free, which breaks the invariants of interrupt::free.

I understood kelida to mean that a &CriticalSection would be passed to all other actual use cases, whereas enable_cs would need an unclonable (unique) CS. While that does end the free() section prematurely, it also consumes the unique CS and thus makes sure that no more code in the free section actually depends on the property that was just lost.

(I still think it's just a bad idea, though -- if you're in a position to consume the unclonable CS token, you could return from your free section just as well).

can't visualize how interrupt::enable_cs causes interrupt::free to nest

I didn't mean that enable_cs causes interrupt:free to nest, I meant that even if the existing CriticalSection werer unclonable, one could still do
interrupt::free(|cs1| {
    interrupt::free(|cs2| {
        enable_cs(cs2)
    })
})
By nesting free, one can obtain two tokens -- and that's all good and correct, because while any of them is alive, interrupts are off. My point is that it's not just the Clone implementation on CriticalSection, it's the whole design of CriticalSection that makes it unsuitable for the two use cases @kelida mentioned.

I thus think that if there is a way forward with these use cases, it's through a separate UniqueCriticalSection (that might be Into<CriticalSection>). That would, on first sight, enable the use cases.

But is it really useful? A constructor for UniqueCriticalSection would need to check whether interrupts might already be disabled, and if so not issue the token. That means either panicking, or giving an Option<UniqueCriticalSection>. Neither is really practical.

My impression is that an interrupt-enabling function is generally unsafe (as long as anyone wants to depend on being inside an interrupt-free section). I have yet to see a case where enabling interrupts is actually a practical thing. (Not so incidentally, I recently made a PR for a big fat red warning in the IRQ enable routine of RIOT-OS -- if interrupt enabling were generally allowed, critical sections could never work with any kind of callback code any more).

That example doesn't work because cs1 and cs2 are both &CriticalSection, since interrupt::free() takes FnOnce(&CriticalSection) -> R. However, enable_cs() takes CriticalSection by value, so it can't take cs1 or cs2. The real issue lies in the fact that interrupt::free() can capture the CriticalSection granted by main, allowing it to call enable_cs() inside the closure:

fn main(cs: CriticalSection) {
     interrupt::free(|cs_ref| {
         enable_cs(cs);
         // Do bad stuff with cs_ref even though interrupts have been disabled
     })
}

The only way I can see of fixing this is to make interrupt::free() take FnMut, in addition to making CriticalSection not Clone. Both are breaking changes, so canning enable_cs() may be the way to go. Also note that introducing UniqueCriticalSection as the argument to main won't solve this issue, since the example will still work. I'm also not a fan of any attributes that enable interrupts before entering main, because it may cause interrupts to fire during peripheral initialization. AFAIK MSP430 always initializes with interrupts turned off, so passing CriticalSection into main and ISRs makes sense.

cr1901 commented 2 years ago

The only way I can see of fixing this is to make interrupt::free() take FnMut

My gut feeling is that this would be a really bad idea and cause many things to either stop compiling or subtly break. But it's just a feeling. I also don't want to diverge from how the rest of Rust embedded does things.

Also note that introducing UniqueCriticalSection as the argument to main won't solve this issue, since the example will still work.

I guess the issue that there is no way to express in Rust "this type cannot be moved into a closure".

I'm also not a fan of any attributes that enable interrupts before entering main, because it may cause interrupts to fire during peripheral initialization.

Do you see any problems w/ the following?: The safe wrapper around main and init could enable interrupts after init, but just before the call to main.

AFAIK MSP430 always initializes with interrupts turned off, so passing CriticalSection into main and ISRs makes sense.

Yea, I don't want to get rid of this feature either.

YuhanLiin commented 2 years ago

Does any other embedded Rust ecosystem have some way to safely set interrupts? If they do then we can copy them.

I guess the issue that there is no way to express in Rust "this type cannot be moved into a closure".

Closest thing is Pin, which doesn't even work here. It's doable if we could declare custom auto-traits that propagate to closures, but that's a pipe dream even in nightly.

Do you see any problems w/ the following?: The safe wrapper around main and init could enable interrupts after init, but just before the call to main.

That works if init can pass variables to main, since I don't want init to put everything into global variables. If other Rust embedded ecosystems follow this model then we should copy what they do. Also init should be optional for back compat.

Yea, I don't want to get rid of this feature either.

It's completely sound in MSP430 AFAIK. The unsoundness is with enable_cs.

cr1901 commented 2 years ago

Does any other embedded Rust ecosystem have some way to safely set interrupts? If they do then we can copy them.

ARM enables interrupts before entering main. This works because thanks to NVIC, all peripherals have a local interrupt disable bit. There is no guarantee in msp430 world that peripherals will have a separate interrupt disable from GIE in the status register. As for RISCV, I'm not sure what the privileged spec guarantees here. IIRC picorv32 has a peripheral interrupt disable for all peripherals as well.

If other Rust embedded ecosystems follow this model then we should copy what they do.

That works if init can pass variables to main, since I don't want init to put everything into global variables. They don't follow this model, this was an example of a way I thought we could work around the problem. And yes, it would involve judicious use of Mutex<OnceCell<T>> in my case :D!

Right now, it does not seem possible to represent interrupt enable in purely safe code. A shame, but at least its easy enough not to abuse- after enabling interrupts in main, don't use the passed-in CriticalSection at all.

YuhanLiin commented 2 years ago

For MSP430 init we should move the discussion to another thread/issue.

chrysn commented 2 years ago

I guess the issue that there is no way to express in Rust "this type cannot be moved into a closure".

That way would be to have one more autotrait like Send that says "can safely be moved into a critical section, i.e., does not allow enabling interrupts". (Then, unlike the usual closure consumers that ask it to be Send before making it callable by another thread, free would ask the closure to be (auto-)WontEnableInterrupts).

Being highly specific, I don't see this happen before we get crate-defined autotraits, which I haven't seen any efforts towards (and it'd be pitfall heavy territory).

On the MSP specifics, please leave a final note here to allow those interested to join the discussion.

cr1901 commented 2 years ago

For MSP430 init we should move the discussion to another thread/issue.

Fair enough. In the context of this issue, we are removing enable_cs (or making it unsafe, though I would prefer removal) as our solution. I'm sure there is a good reason CriticalSection is Clone.

cr1901 commented 2 years ago

enable_cs has been removed from the msp430 crate, so at least from the msp430 side of things CriticalSection being Clone isn't a problem.

As for Mutex<Refcell<_>>, what about https://github.com/rust-embedded/bare-metal/issues/43 as an alternative?