Closed parched closed 6 years ago
For the record I ended up re-implementing AtomicUsize
(using intrinsics::atomic_*
) in my crate. My version doesn't provide CAS loop functionality and it works on thumbv6m-none-eabi
which has "max-atomic-width": 0.
So :+1: to this for me. Though I understand that this is related to the general problem of the "cfg-ability" of the std facade. (Should std compile for a target that doesn't natively support CAS loops? etc.)
ARMv6-M and MSP430 support (i.e. have hardware instructions for) atomic loads and stores but
not CAS loops (e.g. the LDREX and STREX instructions are not available on ARMv6-M). Yet the whole Atomic*
types are not available on the thumbv6m-none-eabi
and msp430-none-elf
built-in targets because they have max-atomic-width
set to 0
in their specification.
Expose a limited Atomic*
API (everything that doesn't require a CAS (compare-and-swap) loop in its
implementation) on ARMv6-M and MSP430.
Atomic loads / stores are enough to implement simple lockless data structures like a single producer single consumer queue -- one such implementation can be found in the heapless
crate. Today, the heapless
implementation of such lockless queue can't support ARMv6-M and MSP430 on stable because the Atomic*
types are missing. heapless
currently supports those architectures by directly using the atomic_load_acq
and similar intrinsics, but this requires a nightly compiler as those intrinsics are (perma) unstable / compiler (LLVM) implementation details.
I propose we add a new experimental (i.e. requires nightly) cfg attribute: say cfg(target_has_atomic = "cas")
(exact syntax is unimportant) along with a matching target specification field, say "atomic-cas": true
, (reminder: target specifications are unstable) and then use those two to provide limited Atomic*
types on ARMv6-M and MSP430 (we'll have to tweak their target specifications to say "max-atomic-width": 32
(or 16) and "atomic-cas": false
).
This would make available and insta-stabilize the Atomic*
types on ARMv6-M and MSP430.
"Isn't it a problem that e.g. AtomicUsize.swap(..)
will compile for x86_64 / ARMv7-M but not for ARMv6-M?" Yes it is, but that's already a problem today because AtomicUsize
is not available on ARMv6-M. This proposal doesn't address that problem.
@rust-lang/libs thoughts on making Atomic*
available on ARMv6-M and MSP430?
Also cc @rust-lang/portability (er, there's no team so cc @jethrogb) I believe cfg(target_has_atomic = "cas")
falls under your purview but there's not rush in stabilizing that. Still, I think I should make you aware about targets not natively supporting CAS loops so you can factor that in your portability lint work / design.
@dylanmckay @dvc94ch Does AVR / RISCV natively support CAS? I suspect that there's some RISCV variant that does, but is there any RISCV variant that does not?
@dylanmckay @pftbest Do multicore AVR / MSP430 microcontrollers even exist? Because if they don't then we can set both max-atomic-width
to 16 (and 8?) and singlethread
to true
in the target specification and call it day. With those settings LLVM will turn all the atomic ops into normal (or maybe volatile?) integer ops -- the wasm32-unknown-unknown
target does this -- so no need for atomic-cas
for those targets. The risk of doing that is that if multicore AVR / MSP430 becomes a thing in the future then the built-in target will generate wrong (non atomic) code for those multicore devices.
@japaric Sounds great, but isn't cfg(target_has_atomic = "cas")
backwards incompatible? If a user is using cfg(target_has_atomic = <width>)
then they assume CAS is present for that width, so it would break on targets with atomic-cas = "false"
, no?
As I understand it, singlethread
only enables load/store atomics not CAS atomics because of interrupt/signal handlers?
Sounds great, but isn't cfg(target_has_atomic = "cas") backwards incompatible?
Maybe? I haven't thought much about it. If that doesn't work we could add cfg(target_has_atomic_load_store = "32")
and cfg(target_has_atomic_cas = "32")
and have cfg(target_has_atomic = "32")
mean both (i.e. cfg(all(..))
). But there's probably some better way to go about it.
In any case, It doesn't matter much how we implement it right now as it will be unstable and core
will be the only user of the new cfg
stuff.
As I understand it, singlethread only enables load/store atomics not CAS atomics because of interrupt/signal handlers?
I think you are right. The wasm target allows CAS operations though. The compiler will happily compile this:
static A: AtomicUsize = AtomicUsize::new(0);
#[no_mangle]
pub extern "C" fn foo() -> usize {
A.swap(1, Ordering::AcqRel)
}
#[no_mangle]
pub extern "C" fn bar() -> usize {
A.swap(2, Ordering::AcqRel)
}
into this:
000045 <foo>:
000047: 01 7f | local[0] type=i32
000049: 41 00 | i32.const 0
00004b: 28 02 80 88 80 80 00 | i32.load 2 1024
000052: 21 00 | set_local 0
000054: 41 00 | i32.const 0
000056: 41 01 | i32.const 1
000058: 36 02 80 88 80 80 00 | i32.store 2 1024
00005f: 20 00 | get_local 0
000061: 0b | end
000062 <bar>:
000064: 01 7f | local[0] type=i32
000066: 41 00 | i32.const 0
000068: 28 02 80 88 80 80 00 | i32.load 2 1024
00006f: 21 00 | set_local 0
000071: 41 00 | i32.const 0
000073: 41 02 | i32.const 2
000075: 36 02 80 88 80 80 00 | i32.store 2 1024
00007c: 20 00 | get_local 0
00007e: 0b | end
(no loops, AFAICT)
I think that's fine on wasm because I believe interrupts / signals / preemption is not possible within the execution of a single wasm module and each wasm module's memory is isolated from the other.
But, yeah, I think singlethread
doesn't give you CAS on targets that support interrupts.
In any case, It doesn't matter much how we implement it right now as it will be unstable and core will be the only user of the new cfg stuff.
To clarify: It doesn't matter much how we implement it right now, as Atomic[IU]{8,16,32,64}
are unstable (not because whatever we do will be unstable). core
won't be the only user, users will need to use this cfg
in their code to appropriately use (or not use) CAS.
It's not clear to me if cfg(target_has_atomic="00")
where 00
is some integer is already considered stable or not. There's no information in the tracking issue
@japaric your proposal sounds great to me! I think we'd totally be down for providing stripped-down Atomic*
types. I didn't actually realize that there was a difference between an atomic load and a volatile load personally!
I think from a stabilization perspective we're all good here as well, it's "no worse" than the situation today and all the appropriate bits will continue to be unstable.
FWIW on wasm we actually emit atomic instructions and all to LLVM, but we also tell LLVM that wasm is single threaded which causes it to legalize atomics to nonatomic operations. (aka that's why atomics work on wasm)
Do multicore AVR / MSP430 microcontrollers even exist? Because if they don't then we can set both max-atomic-width to 16 (and 8?) and singlethread to true in the target specification and call it day.
No, there are no multicore MSP430. But using singlethread
flag would probably be incorrect in the presence of interrupts. I think a better way is to fix LLVM so it emits __sync_fetch_*
intrinsic calls for CAS operations which we can then implement using interrupts disable in some third party crate or compiler_builtins.
There is one more thing about MSP430, it can do RMW operations atomically. For example, you may have a counter that you can atomically increment by some value. But unfortunately all arithmetic operations on the current atomics are CAS, so mapping them on MSP430 would result in a function call and an interrupt disable/enable instead of a single instruction. To fix this I've made a crate msp430-atomic which is a copy of core::sync::atomic, except that every CAS operation is replaced with RMW counterpart which doesn't return the old value. I did it using inline assembly to be sure that all operations are mapped to a single instruction even in debug builds.
I'm not advertising changing the core api, I'm just pointing out that there are other ways to get around the problem. The main benefit of having default atomics working is a code reuse between big and small targets, so cutting out some operations may hurt that goal. I think lowering unsupported operations to a libcalls might be a better solution.
PR #51953 implements the proposal in https://github.com/rust-lang/rust/issues/45085#issuecomment-384825434
Some targets, Armv6-M for example, don't support atomic compare and swap so have "max-atomic-width" set to 0 in their target specs. Often you just need to
AtomicUsize::load
andAtomicUsize::store
and none of the other CAS dependent methods but unfortunately it's not supported by rust even if that targets natively support it.We could add another field to the target spec, "max-atomic-load-store-width", that defaults to the same value as "max-atomic-width", for determining whether to enable
Atomic*
without the CAS dependent methods.cc @japaric who probably would like this for https://github.com/japaric/spscrb