[MemcpyOpt] Missed optimization : unrelated store clobber blocks elimination/merge of `memset`

Alive2 proof: https://alive2.llvm.org/ce/z/_x2Qc2

Motivating example

define void @src(i64 %a) {
  %stack = alloca <256 x i8>, align 8
  %stack1 = getelementptr inbounds i8, ptr %stack, i64 8
  call void @llvm.memset.p0.i64(ptr %stack1, i8 0, i64 136, i1 false)
  store i64 %a, ptr %stack, align 8
  %stack2 = getelementptr inbounds i8, ptr %stack, i64 24
  call void @llvm.memset.p0.i64(ptr %stack2, i8 0, i64 24, i1 false) ; can be eliminated by the first memset
  call void @use(ptr %stack)
  ret void
}

can be folded to

define void @tgt(i64 %a) {
  %stack = alloca <256 x i8>, align 8
  %stack1 = getelementptr inbounds i8, ptr %stack, i64 8
  call void @llvm.memset.p0.i64(ptr %stack1, i8 0, i64 136, i1 false)
  store i64 %a, ptr %stack, align 8
  call void @use(ptr noundef nonnull %stack)
  ret void
}

Real-world motivation

This snippet of IR is derived from postgres/src/backend/utils/adt/ruleutils.c@select_rtable_names_for_explain (after O3 pipeline). The example above is a reduced version. If you're interested in the original suboptimal IR and optimal IR, see also:https://godbolt.org/z/8aMxr37Ev

Let me know if you can confirm that it's an optimization opportunity, thanks.

llvm / llvm-project

[MemcpyOpt] Missed optimization : unrelated store clobber blocks elimination/merge of `memset` #88632

Motivating example

Real-world motivation