llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.42k stars 11.74k forks source link

[PPC64] Sequentially Consistent Load allows Reordering of Stores #65121

Closed lukeg101 closed 1 year ago

lukeg101 commented 1 year ago

Consider the following litmus test that captures bad behaviour:

C test

{ *x = 0; *y = 0; } // fixed initial state where x and y are 0

// Concurrent threads
P0 (atomic_int* y,atomic_int* x) {
  atomic_store_explicit(x,2,memory_order_relaxed);
  atomic_thread_fence(memory_order_consume);
  atomic_store_explicit(y,1,memory_order_seq_cst);
}

P1 (atomic_int* y,atomic_int* x) {
  int r0 = atomic_load_explicit(y,memory_order_seq_cst);
  atomic_thread_fence(memory_order_relaxed);
  atomic_store_explicit(x,1,memory_order_relaxed);
}

exists ([x]=2 /\ 1:r0=1)

where 'P0:r0 = 0' means thread P0, local variable r0 has value 0. [x] means x is a global.

When simulating this test under the C/C++ model from its initial state, the outcome of execution in the exists clause is forbidden by the source model. The allowed outcomes are:

{ P1:r0=0; [x]=1; }
{ P1:r0=0; [x]=2; }
{ P1:r0=1; [x]=1; }

When compiling to target PPC64le with clang trunk -c -g -O1 -pthread --std=c11 -ffreestanding -mabi=elfv1 (https://godbolt.org/z/xj9W7nr9G), the compiled program has the following outcomes when simulated under the PPC model:

{ P1:r0=0; [x]=1; }                           
{ P1:r0=0; [x]=2; }                                                                       
{ P1:r0=1; [x]=1; }    
{ P1:r0=1; [x]=2; } <--- forbidden by source model, bug!

which occurs since the b L0x50; isync pattern allows the memory effect of the load of y on thread P1 to be reordered after the store to x on P1. We observe the execution:

{ P1:r0=0;[x]=0} -> P1:W[x]=1 -> P0:W[x]=2 -> P0:W[y]=1 -> P1:R[y]=0 -> { P1:r0=1; [x]=2; }

where the local variable r0 is stored in register r9 in the compiled code.

The problem is isync forces the instructions to finish, but not their memory effects. Hence the forbidden outcome { P1:r0=1; [x]=2; } is allowed under the ppc model.

PPC test

{ [P1_r0]=0;[x]=0;[y]=0;
  uint64_t %P0_x=x; uint64_t %P0_y=y;
  uint64_t %P1_P1_r0=P1_r0;uint64_t %P1_x=x;
  uint64_t %P1_y=y }
(*****************************************************************)
(*                 the Telechat toolsuite                        *)
(*                                                               *)
(* Luke Geeson, University College London, UK.                   *)
(*                                                               *)
(*****************************************************************)
  P0                 |  P1                    ;
   li r10,2          |   sync                 ;
   stw r10,0(%P0_x)  |   lwz r9,0(%P1_y)      ;
   lwsync            |   cmpw r9,r9           ;
   sync              |   b   L0x50            ;
   li r10,1          |  L0x50: isync          ;
   stw r10,0(%P0_y)  |   li r8,1              ;
                     |   stw r8,0(%P1_x)      ;
                     |   stw r9,0(%P1_P1_r0)  ;

exists ([x]=2 /\ P1_r0=1)  // YES under PPC model

We have observed this behaviour with several hundred Message Passing and 'S' pattern tests (the above is an S pattern test).

To fix this, we advise replacing the isync with lwsync to forbid the buggy behaviour:

PPC test-fixed

{ [P1_r0]=0;[x]=0;[y]=0;
  uint64_t %P0_x=x; uint64_t %P0_y=y;
  uint64_t %P1_P1_r0=P1_r0;uint64_t %P1_x=x;
  uint64_t %P1_y=y }
(*****************************************************************)
(*                 the Telechat toolsuite                        *)
(*                                                               *)
(* Luke Geeson, University College London, UK.                   *)
(*                                                               *)
(*****************************************************************)
  P0                 |  P1                    ;
   li r10,2          |   sync                 ;
   stw r10,0(%P0_x)  |   lwz r9,0(%P1_y)      ;
   lwsync            |   cmpw r9,r9           ;
   sync              |   b   L0x50            ;
   li r10,1          |  L0x50: lwsync        ;
   stw r10,0(%P0_y)  |   li r8,1              ;
                     |   stw r8,0(%P1_x)      ;
                     |   stw r9,0(%P1_P1_r0)  ;

exists ([x]=2 /\ P1_r0=1)  // NO under PPC model

which no longer allows the buggy outcome under the ppc model:

{ P1:r0=0; [x]=1; }                           
{ P1:r0=0; [x]=2; }                                                                       
{ P1:r0=1; [x]=1; } 

I'm happy to discuss this as needed.

I have reported the same bug in GCC.

llvmbot commented 1 year ago

@llvm/issue-subscribers-backend-powerpc

lukeg101 commented 1 year ago

I would argue that perhaps the unconditional branch to the next statement is not needed either, so replace b .next_label; .next_label: isync with lwsync

lukeg101 commented 1 year ago

My Arm colleagues mentioned this may be of interest to you @nemanjai 🙂

lukeg101 commented 1 year ago

Linking GCC discussion as there are two proposed solutions in it: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111246

lukeg101 commented 1 year ago

I'm sorry - there was a bug in the translation tool. LLVM is correct. I will think twice and check everything before I submit another bug.