Open vfdff opened 2 years ago
@llvm/issue-subscribers-backend-aarch64
Looks like some limitation in the analysis of PHI nodes in the loop vectorizer. "-mllvm -enable-pre=false" enables vectorization.
Yes, some PHI nodes with *.pre are generated in function eliminatePartiallyRedundantLoad, and they are not IV type node, which prevent the vectorization.
void GVNPass::eliminatePartiallyRedundantLoad(
LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
MapVector<BasicBlock *, Value *> &AvailableLoads) {
for (const auto &AvailableLoad : AvailableLoads) {
BasicBlock *UnavailableBlock = AvailableLoad.first;
Value *LoadPtr = AvailableLoad.second;
auto *NewLoad =
new LoadInst(Load->getType(), LoadPtr, Load->getName() + ".pre",
option "-mllvm -enable-load-in-loop-pre=false" is more precise for this case, but this pass itself seems no conflict to the vectorization, so need loose the PHI node checking condition in vectorization ?
a more simple test case, https://godbolt.org/z/os31qYh7e
#define LEN_1D 8000ll
#define LEN_2D 128ll
__attribute__((aligned(64))) float a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D];
void s212(struct args_t * func_args)
{
// statement reordering dependency needing temporary
#pragma clang loop vectorize(assume_safety)
// #pragma GCC ivdep
for (int i = 0; i < LEN_1D-1; i++) {
a[i] *= c[i];
b[i] += a[i + 1] * d[i];
}
}
the issue of above both cases still exsit even add patch on https://reviews.llvm.org/D118642
For 2nd
case2, it is unsafe to swap or sink across loop interation
st a[i]
ld a[i+1] --\
------ unsafe to swap or sink, so that may be reason both gcc and llvm don't try to vectorize
st a[i+1] --/
ld a[i+2]
This seems related to #53900. With the fixes linked there, LLVM should be able to vectorize kernel_7
.
As for s212
, we should be able to sink the memory accesses to support the recurrence introduced by GVN PRE (I've got some patches, I'll share them soon).
But for the loop, #pragma clang loop vectorize(assume_safety)
may be problematic, because of the dependency between a[I] <-> a[I+1].
This seems related to #53900. With the fixes linked there, LLVM should be able to vectorize
kernel_7
.
Thanks very much, and I'am waiting your patch land on upstream, and try a test
The first loop is vectorized with current main
now after b8709a9d03f8 https://godbolt.org/z/KEGGjKhGY
thanks @fhahn , yes , it works now.
As the 2nd loop is not expect to vectorize? so I thinks this issue can be cloned now.
Hi @fhahn and @vfdff. I have submitted a patch which helps with Allen's second loop, it introduces a memory dependence check between the load and store and tries to sink the store. I see this loop vectorize with that patch applied: https://reviews.llvm.org/D137159
For the origin 2nd case, the issue is still exist, https://godbolt.org/z/GGEvjbq4x
For the simplified 2nd case, manual pre is benefit to gcc's vectorization, while not for llvm's vectorization, see detail https://godbolt.org/z/hzc3EMMPe
For Livermore Kernel 7, clang don't generate vectorization as too complex arithmetic computation of the inner loop. https://godbolt.org/z/hr7YGoEPq
When I try to delete part of the computation , either
u[k+3] + r*( u[k+2] + r*u[k+1] )
ort*( u[k+6] + r*( u[k+5] + r*u[k+4] ) )
, then the clang can generate vectorization.