Open lukeg101 opened 1 year ago
@llvm/issue-subscribers-backend-aarch64
@llvm/issue-subscribers-backend-arm
@efriedma-quic This is an example of one of those tricky bugs that occur when atomics implementations are mixed
This looks like it's basically the same as https://reviews.llvm.org/D141748, except for 32-bit.
Atomics are part of the ABI. The v7 and v8 model have to be compatible because we don't disallow such mixing. Note you can also get mixing between different compilers, compiler versions, ARM/Thumb-1/Thumb-2.
This looks like it's basically the same as https://reviews.llvm.org/D141748, except for 32-bit.
It also happens in AArch64: https://godbolt.org/z/ovjWG3dsK (as reported in https://github.com/llvm/llvm-project/issues/62652)
My colleagues have pointed out that a simpler test can induce this bug, consider the store buffering test below without RMW ops:
C test
{ x = 0; y = 0 }
P0 (atomic_int *x, atomic_int *y) {
atomic_store_explicit(x,1,memory_order_seq_cst);
int32_t r0 = atomic_load_explicit(y,memory_order_seq_cst);
}
void P1 (atomic_int *x, atomic_int *y) {
atomic_store_explicit(y,1,memory_order_seq_cst);
int32_t r0 = atomic_load_explicit(x,memory_order_seq_cst);
}
exists 0:r0 = 0 /\ 1:r0 = 0
which has the following outcomes under the C/C++ model:
{ P0:r0=0; P1:r0=1; }
{ P0:r0=1; P1:r0=0; }
{ P0:r0=1; P1:r0=1; }
If the store on P0
is compiled for Armv8 and the load for Armv7 (https://godbolt.org/z/Td4TThTan) then we get the following test:
ARM fig7
{ x = 0; y = 0;
uint64_t %P0_x = x; uint64_t %P0_y = y;
uint64_t %P1_x = x; uint64_t %P1_y = y; }
P0 | P1;
MOV R1, #1 | MOV R1, #1;
| DMB ISH;
STL R1, [%P0_x] | STR R1, [%P1_y];
| DMB ISH;
LDR R0, [%P0_y] | LDR R0, [%P1_x];
DMB ISH | DMB ISH;
exists (P0:R0 = 0 /\ P1:R0 = 0)
which has the following outcomes under either the Armv7 model or the Armv8 (AArch32) model:
{ P0:R0=0; P1:R0=0; } <-- additional outcome, bug!
{ P0:R0=0; P1:R0=1; }
{ P0:R0=1; P1:R0=0; }
{ P0:R0=1; P1:R0=1; }
Once again this is because the effects of executing the LDR
can reorder before the STL
instruction, since the LDR
does not have ordering semantics with respect to the STL
.
To fix this, add a fence before the LDR
instruction.
Consider the following litmus test that has buggy behaviour:
where 'P0:r0 = 0' means thread P0, local variable r0 has value 0
When simulating this test under the C/C++ model from its initial state, the outcome of execution in the exists clause is forbidden by the source model. The allowed outcomes are:
When compiling P1, to target armv8-a cortex-a15 (to
dmb;str;dmb;ldr
) using clang trunk (https://godbolt.org/z/v6cso7zq5), compiling the load of y on P0 to target armv7-a cortex-a15 (ldr;dmb
) using clang trunk, and compiling the fetch_add of x on P0 for armv8-a cortex-m55 (alsoldrex;strex;cbnz
loop). The compiled program has the following outcomes when simulated under the aarch32 model:which occurs since the sequentially consistent atomic load of y on P0 is compiled to
ldr;dmb
which does not have a leading dmb fence, and can thus be reordered before the STLEX on P0. Allowing the buggy outcome forbidden by the source.To fix this, add a dmb before the ldr when compiling that load:
dmb;ldr;dmb
to forbid the buggy behaviour:which no longer allows the buggy outcome under the aarch32 model:
This bug would not have been caught in normal execution, but only when multiple implementations are mixed together.