Open sjoerdmeijer opened 1 year ago
@llvm/issue-subscribers-backend-aarch64
Author: Sjoerd Meijer (sjoerdmeijer)
The original loop can be vectorized by changing it to a double loop and adding the -enable-loopinterchange
option as shown below.
float s231_tmp()
{
for (int i = 0; i < 256; ++i) {
for (int j = 1; j < 256; j++) {
aa[j][i] = aa[j - 1][i] + bb[j][i];
}
}
dummy(a, b, c, d, e, aa, bb, cc, 0.);
}
.LBB0_2: // %vector.body
// Parent Loop BB0_1 Depth=1
// => This Inner Loop Header: Depth=2
ld1w { z0.s }, p0/z, [x8, x14, lsl #2]
ld1w { z1.s }, p0/z, [x9, x14, lsl #2]
ld1w { z2.s }, p0/z, [x10, x14, lsl #2]
add x15, x8, x14, lsl #2
add x16, x9, x14, lsl #2
fadd z0.s, z0.s, z2.s
ld1w { z3.s }, p0/z, [x11, x14, lsl #2]
fadd z1.s, z1.s, z3.s
inch x14
cmp x14, #256
st1w { z0.s }, p0, [x15, x13, lsl #2]
st1w { z1.s }, p0, [x16, x13, lsl #2]
b.ne .LBB0_2
The original loop cannot be vectorized even with the -enable-loopinterchange
option. This is because loop interchange does not work.
The reason loop-interchange doesn't work is because the dependency analysis of the load/store instruction determines that there are dependencies that cannot be loop-interchanged. However, I don't think the original loop has any dependencies, so I think it is a bug in the dependency analysis.
Thanks for the analysis, interesting result/conclusion!
The reason loop-interchange doesn't work is because the dependency analysis of the load/store instruction determines that there are dependencies that cannot be loop-interchanged. However, I don't think the original loop has any dependencies, so I think it is a bug in the dependency analysis.
Dependence analysis is correct.
Following the discussion on PR https://github.com/llvm/llvm-project/pull/78951#issuecomment-1908707269 there are dependences carried by the outermost loop.
Loop-interchange needs to focus on the inner two loops.
Looks like we are 1400% (?!) behind for kernel s231 in TSVC compared to GCC. Compile this code with
-O3 -mcpu=neoverse-v2 -ffast-math
:Clang's codegen:
vs. GCC's codegen:
See also: https://godbolt.org/z/jr9WKW95v
TODO: root cause analysis.