llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.41k stars 11.74k forks source link

[MemcpyOpt] Missed optimization : unrelated store clobber blocks elimination/merge of `memset` #88632

Open XChy opened 6 months ago

XChy commented 6 months ago

Alive2 proof: https://alive2.llvm.org/ce/z/_x2Qc2

Motivating example

define void @src(i64 %a) {
  %stack = alloca <256 x i8>, align 8
  %stack1 = getelementptr inbounds i8, ptr %stack, i64 8
  call void @llvm.memset.p0.i64(ptr %stack1, i8 0, i64 136, i1 false)
  store i64 %a, ptr %stack, align 8
  %stack2 = getelementptr inbounds i8, ptr %stack, i64 24
  call void @llvm.memset.p0.i64(ptr %stack2, i8 0, i64 24, i1 false) ; can be eliminated by the first memset
  call void @use(ptr %stack)
  ret void
}

can be folded to

define void @tgt(i64 %a) {
  %stack = alloca <256 x i8>, align 8
  %stack1 = getelementptr inbounds i8, ptr %stack, i64 8
  call void @llvm.memset.p0.i64(ptr %stack1, i8 0, i64 136, i1 false)
  store i64 %a, ptr %stack, align 8
  call void @use(ptr noundef nonnull %stack)
  ret void
}

Real-world motivation

This snippet of IR is derived from postgres/src/backend/utils/adt/ruleutils.c@select_rtable_names_for_explain (after O3 pipeline). The example above is a reduced version. If you're interested in the original suboptimal IR and optimal IR, see also:https://godbolt.org/z/8aMxr37Ev

Let me know if you can confirm that it's an optimization opportunity, thanks.

XChy commented 5 months ago

Looks like a simple store also clobbers: https://alive2.llvm.org/ce/z/H7DowQ A real-world reduced case from git.