Open XChy opened 6 months ago
Induction variables should be canonicalized: https://godbolt.org/z/ndKjsxqoE
The canonicalization is incorrect unless we can guarantee that the loop bound is less than 4294967296.
Thanks for @dtcxzyw, the suggestion on real code has been transferred downstream: https://github.com/jemalloc/jemalloc/pull/2611
Alive2 proof: https://alive2.llvm.org/ce/z/yK_USj (No unrolling due to the slow verfication)
Motivating example
For the source IR (similar to what memset does):
With
opt -O3
, we vectorize it to https://godbolt.org/z/6d5cxeh8K:The
vector.body
loop is obviously equivalent to@llvm.memset.p0.i64(ptr align 1 %vla, i8 0, i64 %n.vec, i1 false)
. It seems to be a phase-ordering problem since LoopIdiomRecoginze goes before LoopVectorize.Real-world motivation
This snippet of IR is derived from jemalloc/src/background_thread.c@background_threads_enable (after O3 pipeline). The example above is a reduced from a big real ir. If you're interested in the original suboptimal IR and optimal IR, please email me.
Let me know if you can confirm that it's an optimization opportunity, thanks.