Concurrency / Atomic access section contains factual errors

rbsexton commented 1 year ago

The section on atomic access gets some important things wrong about Cortex-M. Here's the language:

Atomic Access

On some platforms, special atomic instructions are available, which provide guarantees about read-modify-write operations. Specifically for Cortex-M: thumbv6 (Cortex-M0, Cortex-M0+) only provide atomic load and store instructions, while thumbv7 (Cortex-M3 and above) provide full Compare and Swap (CAS) instructions. These CAS instructions give an alternative to the heavy-handed disabling of all interrupts: we can attempt the increment, it will succeed most of the time, but if it was interrupted it will automatically retry the entire increment operation. These atomic operations are safe even across multiple cores.

The Cortex-M0, Cortex-M0+ does not have atomic load and store. These processors must use critical sections.
Cortex-M3 and and above have atomic load and store via the LDREX\STREX instructions.
Atomic load/store will work across processors if the implementer has included an exclusive access monitor. At the time of this writing, NXP and ST Micro do not implement this feature.
LDREX/STREX do not retry automatically.

Proposed language

Some platforms include special atomic instructions to guarantee read-modify-write operations. Within the Cortex-M family, thumbv6 (Cortex-M0, Cortex-M0+) must make use of critical sections. thumbv7 (Cortex-M3 and above) include atomic load and store (LDREX/STREX) instructions. These instructions are an alternative to the heavy-handed disabling of all interrupts: we can attempt the increment, it will succeed most of the time, but if it was interrupted the firmware can automatically retry the entire increment operation. These atomic may be supported across multiple cores. Check your device reference manual for more details.

adamgreig commented 1 year ago

This was #273, but it's not wrong as such: thumbv6 does guarantee that all aligned byte/halfword/word memory accesses will be atomic, and therefore Rust is able to provide AtomicU8, AtomicU16, AtomicU32 etc types, but with just load() and store() methods for the thumbv6 targets. The LDREX/STREX instruction pair is only available on thumbv7 and above, but is only needed for Rust to implement the CAS methods on those Atomic types; Rust's implementation is then what performs an automatic retry. However, the first version of this paragraph said v6 didn't provide any atomics and that's why it starts by talking about read-modify-write cycles, which v6 certainly doesn't support. The amendment in #273 mentions the basic atomic support in v6 without updating the discussion about r-m-w, which I think is where the confusion has come from. So we should probably update the language to be clearer. Perhaps something more like:

On many platforms, certain memory operations are guaranteed to be atomic, which means the entire operation takes place as though at a single moment in time: a partial or intermediate result is never observable. An operation might be a simple write or a more complicated read, modify, write sequence. For example, it is common on 32-bit platforms for an aligned 32-bit write (that is, a write of 4 bytes to an address that's a multiple of 4) to be atomic, so that you can never observe "half" the write - anything reading the address being written either reads the complete old value, or the complete new value. This is referred to as atomic load and store, and on the Arm thumbv6m platform, all aligned 1, 2, and 4-byte writes are guaranteed to be atomic in this sense. Rust provides a wrapper around this behaviour in the AtomicU8, AtomicU16, AtomicU32, and similar types in core::sync::atomic, which provide load() and store() methods on thumbv6m.

These Atomic* types allow you to access statics without requiring unsafe, since the compiler knows they cannot end up in an invalid state. However, for our counter example we need to be very careful - if we used load() to read the atomic value, incremented it by one, and then used store() to write it back, it's possible the value has changed in the interim, and while this isn't a memory safety issue (it's always a valid integer), it is a race condition and therefore a bug in our code. For thumbv6m platforms and others with only this basic level of atomic support, we still need to use the critical sections described above.

However, some platforms - such as thumbv7m - additionally support atomic read-modify-write operations, also called "compare and swap" or CAS for short. These allow you to perform actions such as atomically incrementing a number while ensuring no interruption from another thread or interrupt, without needing a critical section and the associated delay in handling other interrupts. So, on these platforms, we can rewrite our example to use an AtomicUsize and its fetch_add() method to atomically add 1, without having to worry about it being raced by the timer interrupt.

rbsexton commented 1 year ago

This is getting lost in the weeds. I think you'd be hard pressed to find a thirty-two bit machine where its possible to observe half of an aligned write. If that's the criteria for an 'Atomic' write then the term has little meaning and begs the question of how an AtomicU32 is different from a U32.

The use of the term 'CAS' to refer to V6M/V7M machines is an unfortunate choice, as the SWP instruction is specific to V7A and older devices and was replaced by LDREX/STREX on V7M. if the docs are going to refer to Cortex-M, its preferable to refer to the relevant machine instructions.

I think the core problem here is the term 'Instruction'. The v6m devices have atomic behavior with regard to aligned reads and writes, but they do not have atomic load/store 'instructions'. Likewise the V7M architectures offer CAS functionality via LDREX/STREX, but there is no CAS or SWP instruction.

I'm deeply familiar with the CPU behavior here, but not so much with Rust's semantics and precisely how that maps onto the underlying CPU. Thanks for the detailed response.

adamgreig commented 1 year ago

Ideally we'd avoid being too focussed on a particular hardware platform for this part of the book, so I wanted to introduce the concept fairly generally. There are plenty of 8 and 16 bit platforms that run Rust, where a U32 usually wouldn't be atomic, and U64 is widely used and obviously isn't atomic on 32-bit Cortex-M either, so the AtomicU32 types are meaningfully different to plain u32, and of course has a significant practical difference in Rust. Meanwhile, this meaning (of non-torn word memory accesses) is all Arm means by "atomic" - the other operations are called "exclusive access", never atomic.

We could refer to the machine instructions but I don't think it's really worthwhile here - the point is just that some hardware platforms only provide enough capability for Atomic to have load/store, while others provide enough capability that Atomic provide the full complement of methods. Usually the latter is called "CAS" even if it's not fully accurate. How it's implemented (ldrex/strex on thumbv7, other instructions on eg riscv with the atomic extension) seems too much detail for this section.

My proposed text above doesn't use the term "instruction" at all for that reason, so I'm not sure what you're suggesting should change in it. Do you think it needs to be more explicit about hardware details or even less?

rbsexton commented 1 year ago

This is way too far into philosophy. If you're not going to go into platform - specific detail, than I suggest you boil it down to something concise that doesn't invite technical debate and risk misleading the reader. You've already written it:

Some hardware platforms only provide enough capability for Atomic to have load/store, while others provide enough capability that Atomic can provide the full complement of methods.

The platform-specific details of which methods are supported on which platforms is important and worthy of documentation. Perhaps an appendix would be more appropriate for that.

jamesmunns commented 1 year ago

I see what you're saying @rbsexton, maybe something like:

- Specifically for Cortex-M: thumbv6 (Cortex-M0, Cortex-M0+) only
- provide atomic load and store instructions, while thumbv7 
- (Cortex-M3 and above) provide full Compare and Swap
- (CAS) instructions
+ Specifically for Cortex-M: thumbv6 (Cortex-M0, Cortex-M0+)
+ are only capable of atomically loading and storing 8-32 bit
+ integers, while thumbv7 (Cortex-M3 and above) provide full
+ Compare and Swap (CAS) capability, via the `LDREX` and
+ `STREX` atomic instructions for 8-32 bit integers.

I agree we don't want to get too into the weeds with theory, but also not be incorrect. We do sort of want to introduce the "CAS" term, as it is used consistently in Rust's core/std documentation, so folks are likely to see it again.

eldruin commented 1 year ago

Actually, I found the elaborate explanation in https://github.com/rust-embedded/book/issues/338#issuecomment-1364091467 instructive, providing that kind of background and overview perspective that is so often not provided and everyone has to recreate on their mind. I would be happy to see that in the book somewhere.

rust-embedded / book

Concurrency / Atomic access section contains factual errors #338

Atomic Access