llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
26.83k stars 11k forks source link

[DSE] misoptimized deleted store before non-inlined sync #97252

Open jacobly0 opened 1 week ago

jacobly0 commented 1 week ago
@non_atomic = internal global i8 0
@atomic = internal global i8 0

define void @thread_a() {
  store i8 1, ptr @non_atomic
  call void @sync()
  store i8 2, ptr @non_atomic
  ret void
}

define internal void @sync() noinline {
  store atomic i8 1, ptr @atomic seq_cst, align 1
  br label %1

1:
  %2 = load atomic i8, ptr @atomic seq_cst, align 1
  %3 = icmp ne i8 %2, 0
  br i1 %3, label %1, label %4

4:
  ret void
}

define i8 @thread_b() {
  br label %1

1:
  %2 = load atomic i8, ptr @atomic seq_cst, align 1
  %3 = icmp eq i8 %2, 0
  br i1 %3, label %1, label %4

4:
  %5 = load i8, ptr @non_atomic
  store atomic i8 0, ptr @atomic seq_cst, align 1
  ret i8 %5
}
$ opt --version
LLVM (http://llvm.org/):
  LLVM version 18.1.5
  DEBUG build with assertions.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver4
$ opt repro.ll -passes='require<globals-aa>,function-attrs,dse' -debug-only dse
Trying to eliminate MemoryDefs killed by 1 = MemoryDef(liveOnEntry) (  store i8 1, ptr @non_atomic, align 1)
  trying to get dominating access
   visiting 0 = MemoryDef(liveOnEntry)
   ...  found LiveOnEntryDef
  finished walk
Trying to eliminate MemoryDefs killed by 3 = MemoryDef(2) (  store i8 2, ptr @non_atomic, align 1)
  trying to get dominating access
   visiting 2 = MemoryDef(1) (  call void @sync())
   visiting 1 = MemoryDef(liveOnEntry) (  store i8 1, ptr @non_atomic, align 1)
  Checking for reads of 1 = MemoryDef(liveOnEntry) (  store i8 1, ptr @non_atomic, align 1)
   2 = MemoryDef(1) (  call void @sync())
   3 = MemoryDef(2) (  store i8 2, ptr @non_atomic, align 1)
    ... skipping killing def/dom access
 Checking if we can kill 1 = MemoryDef(liveOnEntry) (  store i8 1, ptr @non_atomic, align 1)
DSE: Remove Dead Store:
  DEAD:   store i8 1, ptr @non_atomic, align 1
  KILLER:   store i8 2, ptr @non_atomic, align 1
  trying to get dominating access
   visiting 0 = MemoryDef(liveOnEntry)
   ...  found LiveOnEntryDef
  finished walk
<snip>
; ModuleID = 'repro.ll'
source_filename = "repro.ll"

@non_atomic = internal global i8 0
@atomic = internal global i8 0

; Function Attrs: nofree norecurse nounwind memory(readwrite, inaccessiblemem: none)
define void @thread_a() #0 {
  call void @sync()
  store i8 2, ptr @non_atomic, align 1
  ret void
}

; Function Attrs: nofree noinline norecurse nounwind memory(readwrite, argmem: none, inaccessiblemem: none)
define internal void @sync() #1 {
  store atomic i8 1, ptr @atomic seq_cst, align 1
  br label %1

1:                                                ; preds = %1, %0
  %2 = load atomic i8, ptr @atomic seq_cst, align 1
  %3 = icmp ne i8 %2, 0
  br i1 %3, label %1, label %4

4:                                                ; preds = %1
  ret void
}

; Function Attrs: nofree norecurse nounwind memory(readwrite, argmem: none, inaccessiblemem: none)
define i8 @thread_b() #2 {
  br label %1

1:                                                ; preds = %1, %0
  %2 = load atomic i8, ptr @atomic seq_cst, align 1
  %3 = icmp eq i8 %2, 0
  br i1 %3, label %1, label %4

4:                                                ; preds = %1
  %5 = load i8, ptr @non_atomic, align 1
  store atomic i8 0, ptr @atomic seq_cst, align 1
  ret i8 %5
}

attributes #0 = { nofree norecurse nounwind memory(readwrite, inaccessiblemem: none) }
attributes #1 = { nofree noinline norecurse nounwind memory(readwrite, argmem: none, inaccessiblemem: none) }
attributes #2 = { nofree norecurse nounwind memory(readwrite, argmem: none, inaccessiblemem: none) }

A store is deleted even though it happens-before a read of that value in another thread which happens-before the killer store, through the following happens-before chain:

fhahn commented 1 week ago

Looks like this is related to some incorrectly cached analyses, only reproduces with -passes='default<O3>' but not withpasses=dse` and the module just before DSE

fhahn commented 1 week ago

Ok just requires globals-aa: https://llvm.godbolt.org/z/eehYaTKcc

As @sync doesn't access @non_atomic, I think the transformation is correct, but maybe @efriedma-quic or @nikic have additional thoughts?

jacobly0 commented 1 week ago

I'll just point out this comment that suggests to me that the atomic store in @sync should act as a barrier to the optimization. https://github.com/llvm/llvm-project/blob/c46a95c147d8ba86980908353d377f9e2f9f5641/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp#L22-L23

fhahn commented 1 week ago

Hm right, putting the atomic store in place of sync, it gets blocked. I guess globals-aa may need to be more conservative or we need to check no-sync for calls.

ParkHanbum commented 6 days ago

is this can be fixed if we add @sync to DSEBarrier?