Open maribu opened 5 years ago
In case my arguments for just assuming that __sync_fetch_and_add_2()
and all its friends are not interruptible for bare metal targets are not convincing: I'd very happily take the compiler flag to tell clang that the library functions are indeed disabling interrupts.
Btw (but a off topic): If there would be flag to allow clang to just emit code to disable interrupts before the atomic operations and restore the previous interrupt state afterwards, this could also be very interesting for embedded developers. This could allow the compiler to group a number of atomic operations that can only be implemented via disabling interrupts and the ROM size would be reduced a bit, as less often interrupts would be disabled and restored. A flag to limit the number of atomics ops to group would be needed to limit the worst case delay for interrupts to be acted upon for those wanting hard real time capabilities, though.
The problem with arm-none-eabi is that we don't really know much about where the code is actually going to run.
Mind the -mcpu=cortex-m0plus
flag that was given during compilation - so it is already known that the code currently running simply can disable interrupts, as there is no "user mode" and "kernel mode" on that platform. This means that to my best knowledge the most efficient implementation of __sync_fetch_and_add_2()
and friends is to simply disable interrupts for the duration of the function. And also: GCC is already enforcing that the implementations has to disable interrupts, as otherwise concurrent lock free operations might interfere with the operations implemented by __sync_fetch_and_add_2()
and friends.
So: Why not just also require that implementations of __sync_fetch_and_add_2()
and friends do disable interrupts on single-threaded single-core embedded CPUs (no MMU, no kernel and user mode)?
If a lock-free sequence does not exist for all possible atomic operations of a given width, the compiler must generate a call to __atomic_*
for all atomic operations of that width. Otherwise, the locking mechanism in libatomic breaks.
The problem with arm-none-eabi
is that we don't really know much about where the code is actually going to run. For arm-pc-linux-eabi
, specifically, __sync_val_compare_and_swap_N
where N=1,2,4,8 is implemented using a special kernel-assisted sequence. But we don't really want make the same assumption for every baremetal target.
Maybe we could control it with a flag.
Thanks for the reply.
The Cortex M0+ is an embedded processor (single-threaded single core CPU), in which the only source of concurrency are interrupts. To my understanding in the given context atomic_store_explicit(&bar, 0x1337, memory_order_relaxed);
is already correctly implemented if a single machine instructions implements the store, as no ISR will interrupt in the middle of that instruction.
Resorting to __sync_fetch_and_add_2
as GCC does for atomic_fetch_add_explicit(&bar, 0x1337, memory_order_relaxed);
is also correct to my understanding, as two machine instructions are needed to implement that.
(Well, you can technically make an arbitrary sequence "atomic" by disabling interrupts, but I don't think that really counts.)
This is exactly how __sync_fetch_and_add_2()
is implemented in the context of RIOT 1. But the use of __atomic_store_2()
should not be required for properly aligned stores - so why is it still used?
I think that's a gcc bug.
If you write something like the following with gcc:
void foo2(void)
{
atomic_fetch_add_explicit(&bar, 0x1337, memory_order_relaxed);
}
It produces the following:
foo2:
push {r4, lr}
ldr r1, .L5
ldr r0, .L5+4
bl __sync_fetch_and_add_2
pop {r4, pc}
Meaning, it's assuming that there's some lock-free atomic sequence that can implement __sync_fetch_and_add_2
. But there is no such sequence in a baremetal environment; cortex-m0 doesn't have the ll/sc instructions. (Well, you can technically make an arbitrary sequence "atomic" by disabling interrupts, but I don't think that really counts.)
Extended Description
Using this minimum example code:
Compiling with
clang -mcpu=cortex-m3 -mlittle-endian -mthumb -mfloat-abi=soft -target arm-none-eabi -Weverything -Werror -c -o test.o test.c
works fine.Compiling with
clang -mcpu=cortex-m0plus -mlittle-endian -mthumb -mfloat-abi=soft -target arm-none-eabi -Weverything -Werror -c -o test.o test.c
yields:However, compiling with
arm-none-eabi-gcc -mcpu=cortex-m0plus -mlittle-endian -mthumb -mfloat-abi=soft -Wall -Wextra -pedantic -Werror -c -o test.o test.c
works fine and generates:Therefore, I assume that the atomic store can be implemented without resorting a a library call to
__atomic_store_2()
.