Open RKSimon opened 3 years ago
If there's some case where we aren't handling constant indices, sure, we should handle them. Not sure what testcase you're thinking of, though.
aliasSameBasePointerGEPs was removed in https://reviews.llvm.org/D92723.
Eli,
Yes, I understand that, thanks.
The relevant indices aren't constant in your testcase.
See also https://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds
Eli,
Would it be advantageous to handle the cases of constant indices?
Roman,
There is this code in aliasSameBasePointerGEPs:
if (GEP1->getNumIndices() != GEP2->getNumIndices() ||
GEP1->getNumIndices() < 2)
{
errs()<<"1\n";
return MayAlias;
}
So if you look at your suggestion, the GEPs:
%arrayidx = getelementptr inbounds %struct.foo, %struct.foo* %p, i64 0, i32 1, i64 %idxprom
%arrayidx8 = getelementptr inbounds %struct.foo, %struct.foo* %p, i64 0, i32 0, i64 %idxprom, i64 %idxprom7
This condition seems overly conservative to me.
I'm sure this is a duplicate of something. The problem is that BasicAA doesn't know how to prove i < N, and there isn't any relevant TBAA info in the IR.
So the problem i guess is: MayAlias: float %arrayidx, float %arrayidx8 MayAlias: %0 = load float, float %arrayidx, align 4, !tbaa !6 <-> store float %add, float %arrayidx8, align 4, !tbaa !6
Isn't LICM dependent on the AA here? So I think this seems like an AA issue.
Right for the modified test case:
#define N 10
struct foo {
float a[N][N];
float b[N];
};
void compute(struct foo *p )
{
for (int i = 0; i < N; i++) {
float temp = .1f * p->b[i];
for (int j = 0; j < N; ++j) {
p->a[i][j] += temp;//0.1f * p->b[j];
}
}
}
LLVM is able to vectorize this if p->b[j] is manually hoisted.
I posted on the mailing list but I should post here, we have a partial fix for this in BasicAliasAnalysis.cpp but it doesn't seem to be complete and may cause errors, was hoping to get thoughts on it:
unsigned MinNumIndex = std::min(GEP1->getNumIndices(),
GEP2->getNumIndices());
if (MinNumIndex < 2)
return MayAlias;
SmallVector<Value *, 8> IntermediateIndices;
IntermediateIndices.push_back(GEP1->getOperand(1));
// Note,
// 1. The index to struct field can only be literal constant.
// 2. So far all the struct field indexes are the same respectively.
// 3. The array indexes could be different, but they don't matter here.
// 4. The types should be the same, say, different array indexes don't
// matter, and the struct indexes be the same respectively.
// If current type is StructType, and the pair of corresponding indexes
// are not equal, NoAlias
for (unsigned i = 1; i != MinNumIndex - 1; ++i) {
if (isa<StructType>(GetElementPtrInst::getIndexedType(
GEP1->getSourceElementType(), IntermediateIndices))) {
ConstantInt *C1 = dyn_cast<ConstantInt>(GEP1->getOperand(i + 1));
ConstantInt *C2 = dyn_cast<ConstantInt>(GEP2->getOperand(i + 1));
if (C1 && C2 && C1->getSExtValue() != C2->getSExtValue())
return NoAlias;
}
IntermediateIndices.push_back(GEP1->getOperand(i + 1));
}
}
// Note that we fall back the original logic here, there are still some thing
// not covered like p->a[2].x v.s. p->a[1].x.b[2]
I think we need to view this as LICM failure - p->b[i]
should be hoisted
from the innermost loop, and then we get:
https://llvm.godbolt.org/z/P8fPr5W9s
For that, yes, AA is somehow overly pessimistic here.
Alias seems to be overly conservative here, or tbaa isn't working correctly, as the reported alias set:
%2 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 1, i64 %indvars.iv14 (read-only)
%8 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv (read)
are not overlapping accesses based on type, but this seems to only be an issue with structs (ie, using the same base pointer?), as it can also vectorize:
#define N 10
float a[N][N];
float b[N];
void foo() {
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
a[i][j] += 0.1f * b[i];
}
}
}
*** IR Dump Before LoopVectorizePass on _Z7computeP3foo ***
; Function Attrs: mustprogress nofree norecurse nosync nounwind uwtable
define dso_local void @_Z7computeP3foo(%struct.foo* nocapture %0) local_unnamed_addr #0 {
br label %.preheader
.preheader: ; preds = %1, %4
%indvars.iv14 = phi i64 [ 0, %1 ], [ %indvars.iv.next15, %4 ]
%2 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 1, i64 %indvars.iv14
br label %5
3: ; preds = %4
ret void
4: ; preds = %5
%indvars.iv.next15 = add nuw nsw i64 %indvars.iv14, 1
%exitcond16.not = icmp eq i64 %indvars.iv.next15, 10
br i1 %exitcond16.not, label %3, label %.preheader, !llvm.loop !3
5: ; preds = %.preheader, %5
%indvars.iv = phi i64 [ 0, %.preheader ], [ %indvars.iv.next, %5 ]
%6 = load float, float* %2, align 4, !tbaa !6
%7 = fmul float %6, 0x3FB99999A0000000
%8 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv
%9 = load float, float* %8, align 4, !tbaa !6
%10 = fadd float %9, %7
store float %10, float* %8, align 4, !tbaa !6
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond.not = icmp eq i64 %indvars.iv.next, 10
br i1 %exitcond.not, label %4, label %5, !llvm.loop !10
}
LV: Checking a loop in "_Z7computeP3foo" from /tmp/test.ll
LV: Loop hints: force=? width=0 interleave=1
LV: Found a loop:
LV: Found an induction variable.
LV: Found FP op with unsafe algebra.
LV: Found FP op with unsafe algebra.
LAA: Found a loop in _Z7computeP3foo:
LAA: Processing memory accesses...
AST: Alias Set Tracker: 1 alias sets for 2 pointer values.
AliasSet[0x1538cc0, 2] may alias, No access Pointers: (float* %8, unknown before-or-after), (float* %2, unknown before-or-after)
LAA: Accesses(3):
%8 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv (write)
%2 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 1, i64 %indvars.iv14 (read-only)
%8 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv (read)
Underlying objects for pointer %8 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv
%struct.foo* %0
Underlying objects for pointer %8 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv
%struct.foo* %0
Underlying objects for pointer %2 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 1, i64 %indvars.iv14
%struct.foo* %0
LAA: Found a runtime check ptr: %8 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 0, i64 %indvars.iv14, i64 %indvars.iv
LAA: Found a runtime check ptr: %2 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 1, i64 %indvars.iv14
LAA: We need to do 0 pointer comparisons.
LAA: May be able to perform a memory runtime check if needed.
LAA: Checking memory dependencies
LAA: Bad stride - Not striding over innermost loop %2 = getelementptr inbounds %struct.foo, %struct.foo* %0, i64 0, i32 1, i64 %indvars.iv14 SCEV: {(400 + %0)<nuw>,+,4}<nuw><%.preheader>
LAA: Src Scev: {(400 + %0)<nuw>,+,4}<nuw><%.preheader>Sink Scev: {{%0,+,40}<nuw><%.preheader>,+,4}<nuw><%5>(Induction step: 0)
LAA: Distance for %6 = load float, float* %2, align 4, !tbaa !6 to store float %10, float* %8, align 4, !tbaa !6: {{-400,+,36}<%.preheader>,+,4}<nw><%5>
Pointer access with non-constant stride
LAA: Src Scev: {{%0,+,40}<nuw><%.preheader>,+,4}<nuw><%5>Sink Scev: {{%0,+,40}<nuw><%.preheader>,+,4}<nuw><%5>(Induction step: 1)
LAA: Distance for %9 = load float, float* %8, align 4, !tbaa !6 to store float %10, float* %8, align 4, !tbaa !6: 0
Total Dependences: 2
LAA: unsafe dependent memory operations in loop
LV: Can't vectorize due to memory conflicts
LV: Not vectorizing: Cannot prove legality.
Extended Description
Reported here: https://lists.llvm.org/pipermail/llvm-dev/2021-July/151796.html
LLVM isn't able to vectorize this code. I'm not sure what the issue is here as it seems to vectorize this code:
Current Codegen: https://llvm.godbolt.org/z/nah3ov9fT