Consider adding load/store atomics for targets that don't support compare and swap

parched commented 7 years ago

Some targets, Armv6-M for example, don't support atomic compare and swap so have "max-atomic-width" set to 0 in their target specs. Often you just need to AtomicUsize::load and AtomicUsize::store and none of the other CAS dependent methods but unfortunately it's not supported by rust even if that targets natively support it.

We could add another field to the target spec, "max-atomic-load-store-width", that defaults to the same value as "max-atomic-width", for determining whether to enable Atomic* without the CAS dependent methods.

cc @japaric who probably would like this for https://github.com/japaric/spscrb

japaric commented 6 years ago

For the record I ended up re-implementing AtomicUsize (using intrinsics::atomic_*) in my crate. My version doesn't provide CAS loop functionality and it works on thumbv6m-none-eabi which has "max-atomic-width": 0.

So :+1: to this for me. Though I understand that this is related to the general problem of the "cfg-ability" of the std facade. (Should std compile for a target that doesn't natively support CAS loops? etc.)

japaric commented 6 years ago

Background

ARMv6-M and MSP430 support (i.e. have hardware instructions for) atomic loads and stores but not CAS loops (e.g. the LDREX and STREX instructions are not available on ARMv6-M). Yet the whole Atomic* types are not available on the thumbv6m-none-eabi and msp430-none-elf built-in targets because they have max-atomic-width set to 0 in their specification.

What we want

Expose a limited Atomic* API (everything that doesn't require a CAS (compare-and-swap) loop in its implementation) on ARMv6-M and MSP430.

Why

Atomic loads / stores are enough to implement simple lockless data structures like a single producer single consumer queue -- one such implementation can be found in the heapless crate. Today, the heapless implementation of such lockless queue can't support ARMv6-M and MSP430 on stable because the Atomic* types are missing. heapless currently supports those architectures by directly using the atomic_load_acq and similar intrinsics, but this requires a nightly compiler as those intrinsics are (perma) unstable / compiler (LLVM) implementation details.

One way to do it

I propose we add a new experimental (i.e. requires nightly) cfg attribute: say cfg(target_has_atomic = "cas") (exact syntax is unimportant) along with a matching target specification field, say "atomic-cas": true, (reminder: target specifications are unstable) and then use those two to provide limited Atomic* types on ARMv6-M and MSP430 (we'll have to tweak their target specifications to say "max-atomic-width": 32 (or 16) and "atomic-cas": false).

This would make available and insta-stabilize the Atomic* types on ARMv6-M and MSP430.

"Isn't it a problem that e.g. AtomicUsize.swap(..) will compile for x86_64 / ARMv7-M but not for ARMv6-M?" Yes it is, but that's already a problem today because AtomicUsize is not available on ARMv6-M. This proposal doesn't address that problem.

@rust-lang/libs thoughts on making Atomic* available on ARMv6-M and MSP430?

Also cc @rust-lang/portability (er, there's no team so cc @jethrogb) I believe cfg(target_has_atomic = "cas") falls under your purview but there's not rush in stabilizing that. Still, I think I should make you aware about targets not natively supporting CAS loops so you can factor that in your portability lint work / design.

@dylanmckay @dvc94ch Does AVR / RISCV natively support CAS? I suspect that there's some RISCV variant that does, but is there any RISCV variant that does not?

@dylanmckay @pftbest Do multicore AVR / MSP430 microcontrollers even exist? Because if they don't then we can set both max-atomic-width to 16 (and 8?) and singlethread to true in the target specification and call it day. With those settings LLVM will turn all the atomic ops into normal (or maybe volatile?) integer ops -- the wasm32-unknown-unknown target does this -- so no need for atomic-cas for those targets. The risk of doing that is that if multicore AVR / MSP430 becomes a thing in the future then the built-in target will generate wrong (non atomic) code for those multicore devices.

parched commented 6 years ago

@japaric Sounds great, but isn't cfg(target_has_atomic = "cas") backwards incompatible? If a user is using cfg(target_has_atomic = <width>) then they assume CAS is present for that width, so it would break on targets with atomic-cas = "false", no?

As I understand it, singlethread only enables load/store atomics not CAS atomics because of interrupt/signal handlers?

japaric commented 6 years ago

Sounds great, but isn't cfg(target_has_atomic = "cas") backwards incompatible?

Maybe? I haven't thought much about it. If that doesn't work we could add cfg(target_has_atomic_load_store = "32") and cfg(target_has_atomic_cas = "32") and have cfg(target_has_atomic = "32") mean both (i.e. cfg(all(..))). But there's probably some better way to go about it.

In any case, It doesn't matter much how we implement it right now as it will be unstable and core will be the only user of the new cfg stuff.

As I understand it, singlethread only enables load/store atomics not CAS atomics because of interrupt/signal handlers?

I think you are right. The wasm target allows CAS operations though. The compiler will happily compile this:

static A: AtomicUsize = AtomicUsize::new(0);

#[no_mangle]
pub extern "C" fn foo() -> usize {
    A.swap(1, Ordering::AcqRel)
}

#[no_mangle]
pub extern "C" fn bar() -> usize {
    A.swap(2, Ordering::AcqRel)
}

into this:

000045 <foo>:
 000047: 01 7f                      | local[0] type=i32
 000049: 41 00                      | i32.const 0
 00004b: 28 02 80 88 80 80 00       | i32.load 2 1024
 000052: 21 00                      | set_local 0
 000054: 41 00                      | i32.const 0
 000056: 41 01                      | i32.const 1
 000058: 36 02 80 88 80 80 00       | i32.store 2 1024
 00005f: 20 00                      | get_local 0
 000061: 0b                         | end
000062 <bar>:
 000064: 01 7f                      | local[0] type=i32
 000066: 41 00                      | i32.const 0
 000068: 28 02 80 88 80 80 00       | i32.load 2 1024
 00006f: 21 00                      | set_local 0
 000071: 41 00                      | i32.const 0
 000073: 41 02                      | i32.const 2
 000075: 36 02 80 88 80 80 00       | i32.store 2 1024
 00007c: 20 00                      | get_local 0
 00007e: 0b                         | end

(no loops, AFAICT)

I think that's fine on wasm because I believe interrupts / signals / preemption is not possible within the execution of a single wasm module and each wasm module's memory is isolated from the other.

But, yeah, I think singlethread doesn't give you CAS on targets that support interrupts.

jethrogb commented 6 years ago

In any case, It doesn't matter much how we implement it right now as it will be unstable and core will be the only user of the new cfg stuff.

To clarify: It doesn't matter much how we implement it right now, as Atomic[IU]{8,16,32,64} are unstable (not because whatever we do will be unstable). core won't be the only user, users will need to use this cfg in their code to appropriately use (or not use) CAS.

It's not clear to me if cfg(target_has_atomic="00") where 00 is some integer is already considered stable or not. There's no information in the tracking issue

alexcrichton commented 6 years ago

@japaric your proposal sounds great to me! I think we'd totally be down for providing stripped-down Atomic* types. I didn't actually realize that there was a difference between an atomic load and a volatile load personally!

I think from a stabilization perspective we're all good here as well, it's "no worse" than the situation today and all the appropriate bits will continue to be unstable.

FWIW on wasm we actually emit atomic instructions and all to LLVM, but we also tell LLVM that wasm is single threaded which causes it to legalize atomics to nonatomic operations. (aka that's why atomics work on wasm)

pftbest commented 6 years ago

Do multicore AVR / MSP430 microcontrollers even exist? Because if they don't then we can set both max-atomic-width to 16 (and 8?) and singlethread to true in the target specification and call it day.

No, there are no multicore MSP430. But using singlethread flag would probably be incorrect in the presence of interrupts. I think a better way is to fix LLVM so it emits __sync_fetch_* intrinsic calls for CAS operations which we can then implement using interrupts disable in some third party crate or compiler_builtins.

There is one more thing about MSP430, it can do RMW operations atomically. For example, you may have a counter that you can atomically increment by some value. But unfortunately all arithmetic operations on the current atomics are CAS, so mapping them on MSP430 would result in a function call and an interrupt disable/enable instead of a single instruction. To fix this I've made a crate msp430-atomic which is a copy of core::sync::atomic, except that every CAS operation is replaced with RMW counterpart which doesn't return the old value. I did it using inline assembly to be sure that all operations are mapped to a single instruction even in debug builds.

I'm not advertising changing the core api, I'm just pointing out that there are other ways to get around the problem. The main benefit of having default atomics working is a code reuse between big and small targets, so cutting out some operations may hurt that goal. I think lowering unsupported operations to a libcalls might be a better solution.

japaric commented 6 years ago

PR #51953 implements the proposal in https://github.com/rust-lang/rust/issues/45085#issuecomment-384825434

rust-lang / rust