Open MabezDev opened 3 years ago
this is a tricky one.
to my understanding, a plain load instruction followed by a compiler fence (this is a compiler hint that does not produce an instruction) is a proper 'atomic load' on single core systems. you only need the atomic fence instruction if you have multiple cores.
in that sense, one can emulate atomic loads and stores on a single-core RISCV32I device. I'm not sure, however, one can do this properly on stable for this particular target because you should use LLVM 'monotonic' ordering in the load operation -- I think this is equivalent to passing Ordering::Relaxed
to AtomicU8::load
if you had AtomicU8
+ Ordering::Relaxed
for this target you could use this code wherever you need the other orderings.
static ATOMIC: AtomicU8 = AtomicU8::new(0);
// single-core version of `store(Ordering::Release)`
atomic.store(42, Ordering::Relaxed);
sync::compiler_fence(Ordering::Release);
the other problem is that there's no conditional compilation option (cfg
) one can use to indicate that some code is being compiled for a single-core system thus you cannot combine the above snippet with some other unsafe
code (to make a SPSC queue) and call the whole thing safe because it will NOT be safe on multi-core devices and there's no mechanism to make the code not compiler for multi-core devices.
I would personally not try to bend defmt to accommodate a target like this until there's better language support for it. In my mind that would mean
AtomicU8::load(_, Ordering::Relaxed)
somehow exists for the targetcfg
mechanism to distinguish between single-core and multi-core systemsor maybe better: just the cfg
mechanism added to the language and then AtomicU8
can be implemented in core for a single-core only version of the RISCV32I target
another option that's maybe even less work:
add another RISCV32I target to the compiler but enable the singlethread
feature in its target spec. the wasm32-unknown-unknown
target has that feature enabled:
$ rustc +nightly -Z unstable-options --print target-spec-json --target wasm32-unknown-unknown
(..)
"singlethread": true,
I think with that you should be able to use LLVM's atomic_load_acquire
intrinsics, etc. in the libcore implementation. LLVM should NOT emit an atomic fence instruction when singlethread
is enabled (iirc)
https://github.com/knurling-rs/defmt/issues/597#issuecomment-930146166
tried this. no luck: it produces a libcall (call to an intrinsic)
#![feature(intrinsics)]
#![feature(lang_items)]
#![feature(no_core)]
#![no_core]
#[lang = "copy"]
pub trait Copy {}
impl Copy for u8 {}
#[lang = "sized"]
pub trait Sized {}
#[no_mangle]
pub fn foo(p: *const u8) -> u8 {
unsafe { atomic_load_acq::<u8>(p) }
}
extern "rust-intrinsic" {
pub fn atomic_load_acq<T: Copy>(src: *const T) -> T;
}
$ # '// comments' are not actually in the file
$ cat riscv32i-singlecore-none-elf.json
{
"arch": "riscv32",
"atomic-cas": false,
"cpu": "generic-rv32",
"data-layout": "e-m:e-p:32:32-i64:64-n32-S128",
"eh-frame-header": false,
"emit-debug-gdb-scripts": false,
"executables": true,
"linker": "rust-lld",
"linker-flavor": "ld.lld",
"llvm-target": "riscv32",
"max-atomic-width": 0, // doesn't matter in no_core
"panic-strategy": "abort",
"relocation-model": "static",
"singlethread": true, // <-
"target-pointer-width": "32"
}
$ cargo rustc --release --target riscv32i-singlecore-none-elf.json -- --emit=obj
$ rust-objdump -Cd --triple riscv32 target/riscv32i-singlecore-none-elf/release/deps/foo-*.o
00000000 <foo>:
0: 13 01 01 ff addi sp, sp, -16
4: 23 26 11 00 sw ra, 12(sp)
8: 93 05 20 00 addi a1, zero, 2
c: 97 00 00 00 auipc ra, 0
10: e7 80 00 00 jalr ra # <- call to other function
14: 83 20 c1 00 lw ra, 12(sp)
18: 13 01 01 01 addi sp, sp, 16
1c: 67 80 00 00 ret
$ rust-nm -CSn target/riscv32i-singlecore-none-elf/release/deps/foo-*.o
U __atomic_load_1 # <- the other function
00000000 00000020 T foo
compare that to the wasm
target. no atomic fence instruction
$ cargo rustc --release --target wasm32-unknown-unknown -- --emit=asm
$ cat target/wasm32-unknown-unknown/release/deps/foo-*.s
foo:
.functype foo (i32) -> (i32)
local.get 0
i32.load8_u 0
end_function
I would expect singlethread: true
+ llvm_target: riscv32
to produce something like the wasm code: a load and no atomic fence instruction
to my understanding, a plain load instruction followed by a compiler fence (this is a compiler hint that does not produce an instruction) is a proper 'atomic load' on single core systems.
this route would be adding a load_relaxed
API to the Atomic*
types -- this is RFC material. The implementation would call the atomic_load_relaxed
intrinsic (+)
impl AtomicU8 {
// available on the riscv32i-*-*-* target
pub fn load_relaxed(&self) -> u8 {
unsafe { intrinsics::atomic_load_relaxed(self.v.get()) }
}
}
then third party libraries can implement the load_relaxed
+ sync::compiler_fence
version of load(Acquire)
I mentioned above -- again, that implementation would only be sound on single core systems. how one would enforce that: I don't know
(+) this does not codegen what I expect for the riscv32i
target
#![crate_type = "lib"]
#![feature(intrinsics)]
#![feature(lang_items)]
#![feature(no_core)]
#![no_core]
#[lang = "copy"]
pub trait Copy {}
impl Copy for u8 {}
#[lang = "sized"]
pub trait Sized {}
#[no_mangle]
pub fn foo(p: *const u8) -> u8 {
unsafe { atomic_load_relaxed::<u8>(p) }
}
extern "rust-intrinsic" {
pub fn atomic_load_relaxed<T: Copy>(src: *const T) -> T;
}
$ cargo rustc --release --target riscv32i-unknown-none-elf -- --emit=obj
$ rust-objdump -Cd --triple riscv32 target/riscv32i-unknown-none-elf/release/deps/foo-*.o
00000000 <foo>:
0: 13 01 01 ff addi sp, sp, -16
4: 23 26 11 00 sw ra, 12(sp)
8: 93 05 00 00 mv a1, zero
c: 97 00 00 00 auipc ra, 0
10: e7 80 00 00 jalr ra
14: 83 20 c1 00 lw ra, 12(sp)
18: 13 01 01 01 addi sp, sp, 16
1c: 67 80 00 00 ret
$ rust-nm -CSn target/riscv32i-unknown-none-elf/release/deps/foo-*.o
U __atomic_load_1
00000000 00000020 T foo
I would expect the machine code to be the exact same as the riscv32imac
target in this load_relaxed
case (no fence instruction)
$ cargo rustc --release --target riscv32imac-unknown-none-elf -- --emit=obj
$ rust-objdump -Cd --triple riscv32 target/riscv32imac-unknown-none-elf/release/deps/foo-*.o
00000000 <foo>:
0: 03 05 05 00 lb a0, 0(a0)
4: 82 80 ret
Thanks for looking into this @japaric.
I think that this is something that should be something implemented in LLVM, as it is for thumbv6 targets. However this is a long term goal, as I suspect it would not be trivial to get upstreamed.
Short term, we are exploring implementing a illegal instruction trap handler and treating the esp32c3 as imac
. This way has an advantage of not 'dirtying' upstream crates like this one. Obviously this will have a performance impact, but hopefully is not too bad.
in principle this should be solvable with an implementation of critical-section
for the relevant RISC-V target or devices. I'm not familiar with the details but defmt-rtt which requires a critical section now works on the RP2040 which lacks the HW instructions required to implement a CAS loop and it's a multi-core device.
@MabezDev could you check the critical-section crate and close this if it's this problem is solvable upstream (in critical-section)?
On the RP2040 we have a hardware spin-lock subsystem, which is what critical-section
uses. IIRC there's two unsafe C FFI functions you have to implement and you can do so any way you like.
Sorry this issue fell off my radar, I forgot to come back and mention I implemented the atomic emulation trap handler: https://github.com/esp-rs/riscv-atomic-emulation-trap. The switch to critical-section
should close this for any RISCV users who don't want to use the trap handler approach.
Thanks for investigating this!
Hi, "RISCV user who doesn't want to use the trap handler approach" here :P I was using the ESP32-C3 (RISC-V) and need to try implementing my code without the atomic emulation trap handler (due to this unrelated issue).
I noticed that defmt-rtt
uses core::sync::atomic
still, making it unusable for people without atomic emulation. Wouldn't that mean that this issue should be repoened?
Good question @TheButlah.
As far as I see we still have 4 atomic variables. The static TAKEN: AtomicBool
in lib.rs
and 3 AtomicUsize
as part of struct Channel
in channel.rs
.
In my understanding we should be able to replace all of them with their non-atomic counterpart, because we only access them while holding the critical section. But I'd like a second opinion on that before going forward with that.
We decided to pause this change until we do a coordinated effort to get risc-v support for defmt
and probe-run
. All the relevant issues will be tracked with the tag topic: risc-v
.
We decided to pause this change until we do a coordinated effort to get risc-v support for
defmt
andprobe-run
. All the relevant issues will be tracked with the tagtopic: risc-v
.
Perhaps I'm overlooking something - my understanding of the ecosystem is quite limited. But why is this issue related to (or more importantly, blocked on) RISC-V support at all?
defmt-rtt
already works on the riscv32imac-unknown-none-elf
target (note the a
). This issue is not about RISC-V support but rather supporting targets that don't have native atomics.
In my case, the riscv32imc-unknown-none-elf
(note the absence of a
) doesn't work on defmt-rtt
, but I'm sure there are other targets that also don't support atomics.
My rationale for wanting to use the -imc
target instead of the -imac
target is because -imc
is more performant.
Could #702 please be reopened and considered for merge? It seems to be a minor change, has no drawbacks for the currently supported targets, is unrelated to RISC-V, but expands support to targets without atomics. defmt-rtt
is one of the core libraries in my firmware and I would love to be able to use it without being forced into the -imac
target
Apologies if I sound gruff, and its entirely possible I've just misunderstood something :)
@TheButlah said:
Perhaps I'm overlooking something - my understanding of the ecosystem is quite limited. But why is this issue related to (or more importantly, blocked on) RISC-V support at all?
defmt-rtt
already works on theriscv32imac-unknown-none-elf
target (note thea
). This issue is not about RISC-V support but rather supporting targets that don't have native atomics.In my case, the
riscv32imc-unknown-none-elf
(note the absence ofa
) doesn't work ondefmt-rtt
, but I'm sure there are other targets that also don't support atomics.My rationale for wanting to use the
-imc
target instead of the-imac
target is because-imc
is more performant.Could #702 please be reopened and considered for merge? It seems to be a minor change, has no drawbacks for the currently supported targets, is unrelated to RISC-V, but expands support to targets without atomics.
defmt-rtt
is one of the core libraries in my firmware and I would love to be able to use it without being forced into the-imac
targetApologies if I sound gruff, and its entirely possible I've just misunderstood something :)
Quoting @japaric s answer to the question:
RISCV+IMC, the architecture variant not the rustc compilation target, does support atomic loads and stores; it does not support CAS operations. In terms of atomics support, that RISCV architecture variant is more or less equivalent to ARMv6-M, which defmt-rtt does support. So, defmt-rtt already supports "targets that don't support atomics" because it does not require CAS operations; it only requires atomic loads and stores.
The difference between the riscv32imc- and the thumbv6m- rustc compilation targets is that the RISCV one has broken LLVM codegen and that's why it doesn't have the Atomic* API available at all.
Great, thanks for clarifying! I appreciate it :)
Would two years of LLVM development have changed the playing ground?
In particular, LLVM 16 release notes mention:
A target feature was introduced to force-enable atomics.
Having three esp32c3
devkits, in 2024, this issue is still alive at least for me.
Assuming that neither old compilers nor custom targets (that created based on old info) are used, the underlying problem should already be fixed in both RISC-V (https://github.com/rust-lang/rust/pull/114499) and AVR (https://github.com/rust-lang/rust/pull/114495). As for the MSP430, it has not been fixed on the LLVM side, but the portable-atomic provides what is needed.
thanks @taiki-e
The problem I experienced was discussed over the weekend on https://matrix.to/#/#esp-rs:matrix.org and seems to be related to probe-rs
0.24.0.
The
esp32c3
is a riscv based cpu, without the atomic extension. Unlike the armthumbv6
target, it does not have atomic load/stores for the following typesu8
,u16
,u32
, which means we cannot usedefmt
(and other rtt based crates) as it stands.I recently landed a PR in
atomic-polyfill
that adds load/store for those types mentioned above, implemented using critical sections. I have a PR inrtt-target
to use that, but it has not yet been accepted.In theory for better performance, we could ditch the critical section and implement our own atomic operation that use fences to ensure atomicity (this is what is done in llvm for
thumbv6
) - however this can cause UB when mixed with lock based CAS atomics (see: here, here, and here ) therefore is not suitable to add toatomic-polyfill
as it stands.I would like the esp32c3 to be able to support the c3, and I am sure there are other riscv, non-a targets too.
What do you think the best idea is going forward?