Mutex is not safe on multi-core systems

adamgreig commented 5 years ago

On a multi-core system, disabling interrupts does not prevent the other core from operating, and so values protected by a bare_metal::Mutex will be incorrectly marked Sync.

Since the overwhelming majority of embedded use cases are single-core, I propose putting a prominent warning in the Mutex docstring for now, and working to develop a safe multi-core extension to the Mutex which can be enabled with a feature gate. Probably something using an atomic to implement a spinlock on top of requiring a CriticalSection.

japaric commented 5 years ago

Probably something using an atomic to implement a spinlock on top of requiring a CriticalSection.

Spinlocks require a CAS operation so it's not possible to provide this on ARMv6-M.

I don't really see a Cargo feature as an option. Enabling a Cargo feature should not break code so neither of these are valid / proper uses:

Remove Sync impl when the "mult-core" feature is enabled (breaks code)
Add Sync impl when the "multi-core" feature is enabled and have it as a default feature. Theoretically OK but in practice it will be impossible to turn off the feature (also it would break code), so this would split the ecosystem in two: those who enable "multi-core" and those who do not.
Have Mutex.borrow fallback to a spinlock when "multi-core" is enabled. Doesn't work on ARMv6-M so enabling it would break compilation for that target.

My proposal would be to remove the Sync impl from Mutex and since that's required for its only use case; we might as well remove the Mutex abstraction.

adamgreig commented 5 years ago

I agree that removing Mutex entirely is the best option but I think we should wait until https://github.com/rust-embedded/wg/issues/294 is at least somewhat resolved.

eddyp commented 5 years ago

Since the overwhelming majority of embedded use cases are single-core, I propose putting a prominent warning in the Mutex docstring for now, and working to develop a safe multi-core extension to the Mutex which can be enabled with a feature gate.

Since this API is still not stable, how about renaming the current Mutex<> to something which makes it clear is safe only for single-core?

As you said, multi-core systems are a minority, so depending on the app, this could be enough.

Spinlocks require a CAS operation so it's not possible to provide this on ARMv6-M.

If we have some APIs which have some clear safety boundary, we can have different implementations on different systems, or some might even be missing, I would consider that acceptable.

rubberduck203 commented 4 years ago

I’ve been doing a bit of research into this topic this morning. Generally speaking, in order to achieve multi-core synchronization, hardware support is required, but perhaps one of these software implementations is feasible?

I am fairly convinced that an implementation of Mutex does not belong in this hardware agnostic crate, but perhaps having a trait here to be implemented in hw specific crates might make sense. However, in that case, I would argue that a Mutex trait belongs in the embedded-hal crate instead.

jonas-schievink commented 4 years ago

perhaps one of these software implementations is feasible?

Yes, something like a spinlock can be implemented without CAS if you know the number of competing parties beforehand. That's what I did with irq::PriorityLock, for example.

The drawback is that spinlocks can easily lead to deadlock when used across interrupts (PriorityLock addresses that by providing different APIs based on the priority level, but I'm not yet sure if that's the way to go). bare_metal::Mutex cannot deadlock.

However, in that case, I would argue that a Mutex trait belongs in the embedded-hal crate instead.

Agreed. Right now we have one in mutex-trait, but IMO it should go into the embedded HAL (perhaps after we have some experience with it). (maybe even in bare-metal)

rubberduck203 commented 4 years ago

After further research on the software implementations, all of them do require a memory barrier. I checked the Cortex-M0, which does not have hardware muted support, and it does have a memory barrier instruction. I don’t know if that’s something we can expect from every mcu though.

bare_metal::Mutex cannot deadlock

While that is a phenomenal property of a mutex, it’s not one I expect to be there. In general, mutexes do not guarantee deadlocks can’t happen and neither does Rust in general. The guarantee is that race conditions can not occur.

I was unaware of the mutex-trait. I have to say that I agree that it should be released as a 0.1 so implementations of it can be created and tested.

https://github.com/rust-embedded/wg/issues/395

That still leaves the question of what to do with the mutex in this crate. I’m a bit concerned about the impact of removing it before another alternative is widely available. Even once another is available, the books will need to be updated accordingly.

therealprof commented 4 years ago

After further research on the software implementations, all of them do require a memory barrier. I checked the Cortex-M0, which does not have hardware muted support, and it does have a memory barrier instruction. I don’t know if that’s something we can expect from every mcu though.

Every MCU in scope of embedded Rust has a memory barrier implementation. I haven't checked those implementations but I'd be very surprised if you would not need a CAS. Typically CAS free algorithms assume the absence of hardware interrupts.

While that is a phenomenal property of a mutex, it’s not one I expect to be there. In general, mutexes do not guarantee deadlocks can’t happen and neither does Rust in general. The guarantee is that race conditions can not occur.

The problem is that in the presence of interrupt handlers deadlocks through to use of e.g. spinlocks are much more likely and cannot be prevented or compile-time checked. This is in stark contrast to a regular operating system, so it is a somewhat important property.

rubberduck203 commented 4 years ago

Every MCU in scope of embedded Rust has a memory barrier implementation.

I’m not sure what this means. Isn’t “every MCU in scope of embedded Rust” simply every MCU? Just because there isn’t support now doesn’t mean there won’t be support in the future.

I haven't checked those implementations but I'd be very surprised if you would not need a CAS. Typically CAS free algorithms assume the absence of hardware interrupts.

It may be worthwhile to research. The first software implementation listed, Dekker’s Algoritm, indicates that a spin lock can be implemented without even a test-and-set instruction, let alone a compare-and-swap. That algo has some serious limitations (it only works for 2 processes), but it does seem that it’s worthwhile looking into how a software implementation may be provided as a fallback for MCUs that don’t have mutex primitive instructions. Much like the irq::PriorityLock that was mentioned.

But I digress. There seems to be a fair bit of agreement that Mutex does not belong in this crate. So the question, as I see it, is should it be removed, what’s the impact downstream, and how do we proceed?

therealprof commented 4 years ago

I’m not sure what this means. Isn’t “every MCU in scope of embedded Rust” simply every MCU? Just because there isn’t support now doesn’t mean there won’t be support in the future.

It means exactly what I said: Every currently supported MCU can do memory barriers. There may be some which are problematic in that respect but I don't know which ones. I have my doubts those can be supported in Rust same as I have my doubts some will be supported even if technically possible but you're right that this is speculation.

But I digress. There seems to be a fair bit of agreement that Mutex does not belong in this crate. So the question, as I see it, is should it be removed, what’s the impact downstream, and how do we proceed?

We cannot remove it unless we have an established and working and supported replacement. This Mutex is used pretty much everywhere.

rubberduck203 commented 4 years ago

Let's not be too theatrical here. Yes, there are 185 crates dependent on bare-metal, however very few are actually using Mutex.

https://github.com/search?l=Rust&q=bare_metal%3A%3AMutex&type=Code

Out of the 44 repositories returned in the search above, very few are actually using Mutex. Many more are using CriticalSection actually. The breakage here looks pretty minimal in reality. Of course, this doesn't account for any proprietary usages of the API, but anyone using a pre-1.0 ecosystem in production knows what they signed up for.

I'd also like to point out that I don't think anyone here is talking about outright deleting this Mutex implementation. I admit that "remove" was a poor choice of words on my part. "Remove" is in context of "remove it from this crate". I would expect that this "good enough for many single core use cases" implementation would move to it's own crate (critical-section-mutex?) so anyone who is using it could easily continue doing so.

therealprof commented 4 years ago

Let's not be too theatrical here. Yes, there are 185 crates dependent on bare-metal, however very few are actually using Mutex.

If you're going to argue with me about essentials and call me theatrical please get at least your data straight: Mutex is re-exported via the cortex-m crate (and possibly other foundational crates for different architectures, too) and mostly gets used from there. It is used all over the map so whatever will be done needs to ensure that we're not breaking the whole ecosystem at once.

rubberduck203 commented 4 years ago

There's no reason to be upset. Let's just take a breath here. I was only trying to come at this with data rather than vague statements.

You're absolutely correct. There are significantly more usages of cortex_m::Interrupt::Mutex. https://github.com/search?l=Rust&p=1&q=cortex_m%3A%3AInterrupt%3A%3AMutex&type=Code

This could still be easily handled by creating a new crate with the critical section mutex that, with the exception of you @therealprof, people don't seem to believe belongs in this crate. Once released, cortex-m could reference and re-export it transparently to all of those users. The only people we break are the people using bare_metal::Mutex directly.

Of course, none of this solves the fact that this Mutex is not safe on multi-core systems, but there also seems to be some consensus that it's dubious at best to think that a reasonable multi-core safe Mutex can be implemented without hardware support.

therealprof commented 4 years ago

This could still be easily handled by creating a new crate with the critical section mutex that, with the exception of you @therealprof, people don't seem to believe belongs in this crate.

Only a small fraction of people actually chimed in here, so it's a bit early to make such statements.

Indeed I don't have any issues with the Mutex being here but I don't have any problems with moving it either. I'm just pointing out the obvious that any change to a foundational crate like this needs to be planned and executed with extreme care.

We still don't have the ability to do something like a crater run to ensure that we're not accidentally causing major damage to the ecosystem, so I'd rather we treat with extreme caution.

Of course, none of this solves the fact that this Mutex is not safe on multi-core systems, but there also seems to be some consensus that it's dubious at best to think that a reasonable multi-core safe Mutex can be implemented without hardware support.

Indeed.

rubberduck203 commented 4 years ago

@therealprof I was looking into this again this morning. It is not Mutex that is unsound for multiple cores. It's using cortex_m::interrupt::free to provide the CriticalSection that is unsound on multi-core.

It is possible to implement a different mechanism to provide the CritcialSection that would be sound.

https://gist.github.com/rubberduck203/20415cb0bdc0726b2ebf0903e7193665

rubberduck203 commented 4 years ago

Just to be clear, the lock I linked to isn’t sound either, it should use a compare and swap, not an exchange, but is just to prove out that sound methods of providing a lock for the existing mutex can be implemented.

therealprof commented 4 years ago

Yeah, this has been discussed back and forth.

Problem is: spinlocks are not ideal either for other reasons and also this implementation will not work on e.g. all Cortex-M0 and M0+ because they don't have CAS instructions so it's not an universally applicable approach.

rubberduck203 commented 4 years ago

I think that’s the point. There is no universal approach, but the existing Mutex does allow for individualized approaches that will. It’s maybe not an ideal API, but that’s another matter.

Since the Mutex isn’t the soundness problem, should this issue be closed in preference of the other mutex discussions happening?

jonas-schievink commented 4 years ago

There seems to be some confusion about what CriticalSection and Mutex provide here.

CriticalSection is just a token that guarantees, for the duration of its existence, that the current core is in a critical section (ie. has any interrupts that could preempt execution disabled). The contract of this type means that any safe code that constructs a CriticalSection without disabling interrupts is unsound.
Mutex then takes this no-interrupts guarantee to provide mutual exclusion. This is sound if we adopt https://github.com/rust-embedded/wg/pull/419, but not at the current state. It's sound in any case if the data is only accessible from a single core.

I'll improve the docs of CriticalSection to make its contract clearer.

rubberduck203 commented 4 years ago

That’s a good idea @jonas-schievink. It took me quite a minute to completely grok how the two interact, and the guarantee that CriticalSection must provide. Would you mind explaining to me why Mutex itself is unsound on multi-core though? I’m not understanding why a CriticalSection couldn’t be provided to it from a global monitor, as described in the ARM synchronization primitives paper.

eddyp commented 4 years ago

perhaps having a trait here to be implemented in hw specific crates might make sense. However, in that case, I would argue that a Mutex trait belongs in the embedded-hal crate instead.

I am of the opinion that we're still thinking in terms of ARM cores, or even ARM Cortex M cores.

On SoC with hybrid cores there could be a mix of Cortex A and Cortex M cores, or even non-ARM cores such as RISC-V, so I expect there must be a HW peripheral that could properly implement the synchronization across cores, so, just as I suggested in my 2019 Oxidize Conf presentation (https://www.youtube.com/watch?v=IKXrNlXXfL4#t=29m11s), a trait for such HW-enabled mechanisms is desirable.

jonas-schievink commented 4 years ago

That would be by mutex-trait and custom cross-core Send/Sync traits that do not yet exist.

jonas-schievink commented 4 years ago

cross-core Send/Sync traits that do not yet exist.

Actually you might not even need this if the cores have their own peripherals and a shared mutex peripheral.

rubberduck203 commented 4 years ago

I’d like to be clear, I referenced the ARM paper, but the problem & solution are the same for any platform.

adamgreig commented 4 years ago

This issue was effectively closed by https://github.com/rust-embedded/wg/pull/419; the Mutex in bare-metal is only considered sound on single-core systems and some other abstraction will be required for multi-core systems.

rust-embedded / bare-metal

Mutex is not safe on multi-core systems #12