Single-instruction-read/writes atomics

Ten0 commented 4 years ago

We have Mutex. However, opening a critical section (disabling interrupts) to just read or write a Mutex<Cell<u8>> seems a bit overkill: the read or write is a single instruction, which makes it impossible for an interrupt to hit in the middle. How about this crate also provides some form of atomics with respect to what the Send and Sync markers mean in our context?

(I beleive they would be implemented this way, I'd be very willing to open a PR)

use core::cell::UnsafeCell;

pub struct SingleCoreAtomicUsize {
    inner: UnsafeCell<usize>,
}

/// Safety:
/// We only provide `get` and `set`, which all take a single instruction, so no interrupt may hit in the middle
unsafe impl Sync for SingleCoreAtomicUsize {}

impl SingleCoreAtomicUsize {
    pub const fn new(val: usize) -> Self {
        Self {
            inner: UnsafeCell::new(val),
        }
    }
    pub fn get(&self) -> usize {
        unsafe { *self.inner.get() }
    }
    pub fn set(&self, val: usize) {
        unsafe { *self.inner.get() = val }
    }
}

jonas-schievink commented 4 years ago

This is AtomicUsize

Ten0 commented 4 years ago

This is AtomicUsize

But is this available on all single-core platforms, e.g. Cortex M0? I thought it wasn't (https://rust-embedded.github.io/book/concurrency/index.html#atomic-access). Also if it does synchronization across cores, it's not exactly the same thing as I'm proposing: atomics across interrupts on the same core, with no cross-core synchronization overhead.

jonas-schievink commented 4 years ago

The atomic types are available everywhere (if not, that's a bug in Rust or LLVM), but CAS operations might not be. load and store are always available.

You still need to synchronize access between interrupt handlers and idle. As written, SingleCoreAtomicUsize causes data races between every get and set, which is undefined behavior.

Ten0 commented 4 years ago

The atomic types are available everywhere (if not, that's a bug in Rust or LLVM), but CAS operations might not be. load and store are always available.

Thanks, I will double check this.

You still need to synchronize access between interrupt handlers and idle. As written, SingleCoreAtomicUsize causes data races between every get and set, which is undefined behavior.

I don't understand what data race you're talking about. Could you give an example of a scenario where the data race would trigger, and how it's UB?

Also, I'm unsure how I can get AtomicUsize to have the behaviour I was looking for: for platforms that do support multi-core (e.g your typical computer), how would you trigger a no-cross-core-synchronization read or write from an AtomicUsize? Just using Ordering::Relaxed?

jonas-schievink commented 4 years ago

The unsafe impl Sync allows use of SingleCoreAtomicUsize across threads, but there is no inner synchronization, so any usage of get or set from more than one thread would be a data race, as the stores and loads happen unsynchronized with each other.

This is still the case if you replace "thread" with "interrupt handler". For example, the compiler might assume that a value loaded by subsequent gets stays the same if there is no set inbetween. This is wrong in the presence of interrupts.

The Relaxed ordering is all you need as long as you only need to synchronize accesses of the AtomicUsize itself, not any other data (the other orderings are used when writing lock free data structures with atomics, and affect how the compiler synchronizes non-atomic accesses to the actual data stored in the data structure).

Ten0 commented 4 years ago

Thanks a lot for the explanation :+1: I think I understood almost all of it now! :)

The compiler might assume that a value loaded by subsequent gets stays the same if there is no set inbetween. This is wrong in the presence of interrupts.

From the UnsafeCell documentation:

UnsafeCell is a type that wraps some T and indicates unsafe interior operations on the wrapped type. Types with an UnsafeCell field are considered to have an 'unsafe interior'. The compiler makes optimizations based on the knowledge that &T is not mutably aliased or mutated, and that &mut T is unique. UnsafeCell is the only core language feature to work around the restriction that &T may not be mutated.

I thought this explained it prevented that kind of optimizations.and enforced actually re-reading the value everytime it's asked. Does it actually only force re-read if the compiler detects it might have changed due to some mutable aliasing, but in the same execution flow?

jonas-schievink commented 4 years ago

That was just an example of an assumption that a compiler might make. You'd have to ask that question in terms of Rust's language semantics to get a complete answer, and I'm not sure how the precise answer looks there.

pftbest commented 4 years ago

@Ten0

Here is a real example with your code https://godbolt.org/z/GsYx1q

you can see that

    // Wait for the flag
    while FOO.get() == 0 {
        // nop
    }

is compiled to

    // Wait for the flag
    if FOO.get() == 0 {
        loop {
            // nop
        }
    }

Compiler assumes there is no data races in the code, so value of FOO can never change inside the loop. Which means no need to read it every time we can check it only once.

adamgreig commented 4 years ago

This exact problem actually can occur with the bare-metal Mutex, which is why we added documentation in #34 after discussion in https://github.com/rust-embedded/bare-metal/pull/33#issuecomment-634210146. In the cortex-m crate, compiler fences or memory clobbers are used when entering and leaving critical sections to ensure the compiler does not move accesses outside the the critical section, as otherwise @pftbest's example above would miscompile the same way.

Ten0 commented 4 years ago

Great, thank you for all the examples and references! Closing this as it's already in core.

rust-embedded / bare-metal

Single-instruction-read/writes atomics #37