Support for targets without native atomics

MabezDev commented 3 years ago

The esp32c3 is a riscv based cpu, without the atomic extension. Unlike the arm thumbv6 target, it does not have atomic load/stores for the following types u8, u16, u32, which means we cannot use defmt (and other rtt based crates) as it stands.

I recently landed a PR in atomic-polyfill that adds load/store for those types mentioned above, implemented using critical sections. I have a PR in rtt-target to use that, but it has not yet been accepted.

In theory for better performance, we could ditch the critical section and implement our own atomic operation that use fences to ensure atomicity (this is what is done in llvm for thumbv6) - however this can cause UB when mixed with lock based CAS atomics (see: here, here, and here ) therefore is not suitable to add to atomic-polyfill as it stands.

I would like the esp32c3 to be able to support the c3, and I am sure there are other riscv, non-a targets too.

What do you think the best idea is going forward?

japaric commented 3 years ago

this is a tricky one.

to my understanding, a plain load instruction followed by a compiler fence (this is a compiler hint that does not produce an instruction) is a proper 'atomic load' on single core systems. you only need the atomic fence instruction if you have multiple cores.

in that sense, one can emulate atomic loads and stores on a single-core RISCV32I device. I'm not sure, however, one can do this properly on stable for this particular target because you should use LLVM 'monotonic' ordering in the load operation -- I think this is equivalent to passing Ordering::Relaxed to AtomicU8::load

if you had AtomicU8 + Ordering::Relaxed for this target you could use this code wherever you need the other orderings.

static ATOMIC: AtomicU8 = AtomicU8::new(0);

// single-core version of `store(Ordering::Release)`
atomic.store(42, Ordering::Relaxed);
sync::compiler_fence(Ordering::Release);

the other problem is that there's no conditional compilation option (cfg) one can use to indicate that some code is being compiled for a single-core system thus you cannot combine the above snippet with some other unsafe code (to make a SPSC queue) and call the whole thing safe because it will NOT be safe on multi-core devices and there's no mechanism to make the code not compiler for multi-core devices.

I would personally not try to bend defmt to accommodate a target like this until there's better language support for it. In my mind that would mean

AtomicU8::load(_, Ordering::Relaxed) somehow exists for the target
there's some cfg mechanism to distinguish between single-core and multi-core systems

or maybe better: just the cfg mechanism added to the language and then AtomicU8 can be implemented in core for a single-core only version of the RISCV32I target

japaric commented 3 years ago

another option that's maybe even less work:

add another RISCV32I target to the compiler but enable the singlethread feature in its target spec. the wasm32-unknown-unknown target has that feature enabled:

$ rustc +nightly -Z unstable-options --print target-spec-json --target wasm32-unknown-unknown
(..)
  "singlethread": true,

I think with that you should be able to use LLVM's atomic_load_acquire intrinsics, etc. in the libcore implementation. LLVM should NOT emit an atomic fence instruction when singlethread is enabled (iirc)

japaric commented 3 years ago

https://github.com/knurling-rs/defmt/issues/597#issuecomment-930146166

tried this. no luck: it produces a libcall (call to an intrinsic)

#![feature(intrinsics)]
#![feature(lang_items)]
#![feature(no_core)]
#![no_core]

#[lang = "copy"]
pub trait Copy {}

impl Copy for u8 {}

#[lang = "sized"]
pub trait Sized {}

#[no_mangle]
pub fn foo(p: *const u8) -> u8 {
    unsafe { atomic_load_acq::<u8>(p) }
}

extern "rust-intrinsic" {
    pub fn atomic_load_acq<T: Copy>(src: *const T) -> T;
}

$ # '// comments' are not actually in the file
$ cat riscv32i-singlecore-none-elf.json
{
  "arch": "riscv32",
  "atomic-cas": false,
  "cpu": "generic-rv32",
  "data-layout": "e-m:e-p:32:32-i64:64-n32-S128",
  "eh-frame-header": false,
  "emit-debug-gdb-scripts": false,
  "executables": true,
  "linker": "rust-lld",
  "linker-flavor": "ld.lld",
  "llvm-target": "riscv32",
  "max-atomic-width": 0, // doesn't matter in no_core
  "panic-strategy": "abort",
  "relocation-model": "static",
  "singlethread": true, // <-
  "target-pointer-width": "32"
}

$ cargo rustc --release --target riscv32i-singlecore-none-elf.json -- --emit=obj

$ rust-objdump -Cd --triple riscv32 target/riscv32i-singlecore-none-elf/release/deps/foo-*.o
00000000 <foo>:
       0: 13 01 01 ff   addi    sp, sp, -16
       4: 23 26 11 00   sw      ra, 12(sp)
       8: 93 05 20 00   addi    a1, zero, 2
       c: 97 00 00 00   auipc   ra, 0
      10: e7 80 00 00   jalr    ra # <- call to other function
      14: 83 20 c1 00   lw      ra, 12(sp)
      18: 13 01 01 01   addi    sp, sp, 16
      1c: 67 80 00 00   ret

$ rust-nm -CSn  target/riscv32i-singlecore-none-elf/release/deps/foo-*.o
                  U __atomic_load_1 # <- the other function
00000000 00000020 T foo

compare that to the wasm target. no atomic fence instruction

$ cargo rustc --release --target wasm32-unknown-unknown -- --emit=asm

$ cat target/wasm32-unknown-unknown/release/deps/foo-*.s
foo:
        .functype       foo (i32) -> (i32)
        local.get       0
        i32.load8_u     0
        end_function

I would expect singlethread: true + llvm_target: riscv32 to produce something like the wasm code: a load and no atomic fence instruction

japaric commented 3 years ago

to my understanding, a plain load instruction followed by a compiler fence (this is a compiler hint that does not produce an instruction) is a proper 'atomic load' on single core systems.

this route would be adding a load_relaxed API to the Atomic* types -- this is RFC material. The implementation would call the atomic_load_relaxed intrinsic (+)

impl AtomicU8 {
    // available on the riscv32i-*-*-* target
    pub fn load_relaxed(&self) -> u8 {
        unsafe { intrinsics::atomic_load_relaxed(self.v.get()) }
    }
}

then third party libraries can implement the load_relaxed + sync::compiler_fence version of load(Acquire) I mentioned above -- again, that implementation would only be sound on single core systems. how one would enforce that: I don't know

(+) this does not codegen what I expect for the riscv32i target

#![crate_type = "lib"]
#![feature(intrinsics)]
#![feature(lang_items)]
#![feature(no_core)]
#![no_core]

#[lang = "copy"]
pub trait Copy {}

impl Copy for u8 {}

#[lang = "sized"]
pub trait Sized {}

#[no_mangle]
pub fn foo(p: *const u8) -> u8 {
    unsafe { atomic_load_relaxed::<u8>(p) }
}

extern "rust-intrinsic" {
    pub fn atomic_load_relaxed<T: Copy>(src: *const T) -> T;
}

$ cargo rustc --release --target riscv32i-unknown-none-elf -- --emit=obj

$ rust-objdump -Cd --triple riscv32 target/riscv32i-unknown-none-elf/release/deps/foo-*.o
00000000 <foo>:
       0: 13 01 01 ff   addi    sp, sp, -16
       4: 23 26 11 00   sw      ra, 12(sp)
       8: 93 05 00 00   mv      a1, zero
       c: 97 00 00 00   auipc   ra, 0
      10: e7 80 00 00   jalr    ra
      14: 83 20 c1 00   lw      ra, 12(sp)
      18: 13 01 01 01   addi    sp, sp, 16
      1c: 67 80 00 00   ret

$ rust-nm -CSn target/riscv32i-unknown-none-elf/release/deps/foo-*.o
                  U __atomic_load_1
00000000 00000020 T foo

I would expect the machine code to be the exact same as the riscv32imac target in this load_relaxed case (no fence instruction)

$ cargo rustc --release --target riscv32imac-unknown-none-elf -- --emit=obj

$ rust-objdump -Cd --triple riscv32 target/riscv32imac-unknown-none-elf/release/deps/foo-*.o
00000000 <foo>:
       0: 03 05 05 00   lb      a0, 0(a0)
       4: 82 80         ret

MabezDev commented 3 years ago

Thanks for looking into this @japaric.

I think that this is something that should be something implemented in LLVM, as it is for thumbv6 targets. However this is a long term goal, as I suspect it would not be trivial to get upstreamed.

Short term, we are exploring implementing a illegal instruction trap handler and treating the esp32c3 as imac. This way has an advantage of not 'dirtying' upstream crates like this one. Obviously this will have a performance impact, but hopefully is not too bad.

japaric commented 2 years ago

in principle this should be solvable with an implementation of critical-section for the relevant RISC-V target or devices. I'm not familiar with the details but defmt-rtt which requires a critical section now works on the RP2040 which lacks the HW instructions required to implement a CAS loop and it's a multi-core device.

@MabezDev could you check the critical-section crate and close this if it's this problem is solvable upstream (in critical-section)?

jonathanpallant commented 2 years ago

On the RP2040 we have a hardware spin-lock subsystem, which is what critical-section uses. IIRC there's two unsafe C FFI functions you have to implement and you can do so any way you like.

MabezDev commented 2 years ago

Sorry this issue fell off my radar, I forgot to come back and mention I implemented the atomic emulation trap handler: https://github.com/esp-rs/riscv-atomic-emulation-trap. The switch to critical-section should close this for any RISCV users who don't want to use the trap handler approach.

Thanks for investigating this!

TheButlah commented 2 years ago

Hi, "RISCV user who doesn't want to use the trap handler approach" here :P I was using the ESP32-C3 (RISC-V) and need to try implementing my code without the atomic emulation trap handler (due to this unrelated issue).

I noticed that defmt-rtt uses core::sync::atomic still, making it unusable for people without atomic emulation. Wouldn't that mean that this issue should be repoened?

Urhengulas commented 2 years ago

Good question @TheButlah.

As far as I see we still have 4 atomic variables. The static TAKEN: AtomicBool in lib.rs and 3 AtomicUsize as part of struct Channel in channel.rs.

In my understanding we should be able to replace all of them with their non-atomic counterpart, because we only access them while holding the critical section. But I'd like a second opinion on that before going forward with that.

Urhengulas commented 2 years ago

We decided to pause this change until we do a coordinated effort to get risc-v support for defmt and probe-run. All the relevant issues will be tracked with the tag topic: risc-v.

TheButlah commented 2 years ago

We decided to pause this change until we do a coordinated effort to get risc-v support for defmt and probe-run. All the relevant issues will be tracked with the tag topic: risc-v.

Perhaps I'm overlooking something - my understanding of the ecosystem is quite limited. But why is this issue related to (or more importantly, blocked on) RISC-V support at all?

defmt-rtt already works on the riscv32imac-unknown-none-elf target (note the a). This issue is not about RISC-V support but rather supporting targets that don't have native atomics.

In my case, the riscv32imc-unknown-none-elf (note the absence of a) doesn't work on defmt-rtt, but I'm sure there are other targets that also don't support atomics.

My rationale for wanting to use the -imc target instead of the -imac target is because -imc is more performant.

Could #702 please be reopened and considered for merge? It seems to be a minor change, has no drawbacks for the currently supported targets, is unrelated to RISC-V, but expands support to targets without atomics. defmt-rtt is one of the core libraries in my firmware and I would love to be able to use it without being forced into the -imac target

Apologies if I sound gruff, and its entirely possible I've just misunderstood something :)

Urhengulas commented 2 years ago

@TheButlah said:

Perhaps I'm overlooking something - my understanding of the ecosystem is quite limited. But why is this issue related to (or more importantly, blocked on) RISC-V support at all?

defmt-rtt already works on the riscv32imac-unknown-none-elf target (note the a). This issue is not about RISC-V support but rather supporting targets that don't have native atomics.

In my case, the riscv32imc-unknown-none-elf (note the absence of a) doesn't work on defmt-rtt, but I'm sure there are other targets that also don't support atomics.

My rationale for wanting to use the -imc target instead of the -imac target is because -imc is more performant.

Could #702 please be reopened and considered for merge? It seems to be a minor change, has no drawbacks for the currently supported targets, is unrelated to RISC-V, but expands support to targets without atomics. defmt-rtt is one of the core libraries in my firmware and I would love to be able to use it without being forced into the -imac target

Apologies if I sound gruff, and its entirely possible I've just misunderstood something :)

Quoting @japaric s answer to the question:

RISCV+IMC, the architecture variant not the rustc compilation target, does support atomic loads and stores; it does not support CAS operations. In terms of atomics support, that RISCV architecture variant is more or less equivalent to ARMv6-M, which defmt-rtt does support. So, defmt-rtt already supports "targets that don't support atomics" because it does not require CAS operations; it only requires atomic loads and stores.

The difference between the riscv32imc- and the thumbv6m- rustc compilation targets is that the RISCV one has broken LLVM codegen and that's why it doesn't have the Atomic* API available at all.

TheButlah commented 2 years ago

Great, thanks for clarifying! I appreciate it :)

lure23 commented 3 months ago

Would two years of LLVM development have changed the playing ground?

In particular, LLVM 16 release notes mention:

A target feature was introduced to force-enable atomics.

Having three esp32c3 devkits, in 2024, this issue is still alive at least for me.

taiki-e commented 3 months ago

Assuming that neither old compilers nor custom targets (that created based on old info) are used, the underlying problem should already be fixed in both RISC-V (https://github.com/rust-lang/rust/pull/114499) and AVR (https://github.com/rust-lang/rust/pull/114495). As for the MSP430, it has not been fixed on the LLVM side, but the portable-atomic provides what is needed.

lure23 commented 3 months ago

thanks @taiki-e

The problem I experienced was discussed over the weekend on https://matrix.to/#/#esp-rs:matrix.org and seems to be related to probe-rs 0.24.0.

knurling-rs / defmt

Support for targets without native atomics #597