Open LebedevRI opened 5 years ago
mentioned in issue llvm/llvm-bugzilla-archive#43836
Unfortunately, I agree with Eli that this is going to be delicate and very complicated to get tuned just right. The first step will be tuning LSR to ensure that the (proposed) canonical form gets turned into the optimal sequence on each target reliably. LSR is fragile and annoying, so that's a significant amount of work and risk to take on.
Unless you are strongly motivated by this case, I'd honestly advise against it.
Feel free to try, but if you seriously pursue this, I suspect you're going to end up spending a lot of time looking at LSR...
Please make sure you take some time to look at the history of disable-iv-rewrite.
The reason i'm revisiting this is https://reviews.llvm.org/D107935 I'm basically thinking about having a pass to rewrite all pointer math so that:
There already exists SeparateConstOffsetFromGEP pass, and it seems to be run by a few backends already. Not sure if we can extend & use it.
From an analysis standpoint, it doesn't really matter much; the SCEV expressions are equivalent either way. And there aren't really any interesting optimizations if we canonicalize early, I think. IndVarSimplify used to canonicalize aggressively, but we turned that off a while back because it was messing up optimized loops without any obvious benefit.
If LSR can't treat the two forms as equivalent, that's obviously a problem. Not sure why that's happening here, though, at first glance.
Surely sticking with the current status-quo leaves the performance on the floor for the code written in the opposite style from the one optimal for given $target.
I would really guess that the integer induction should be canonical? For which targets would pointer induction be better?
Interesting. I suppose that IndVarSimplify doesn't catch this. Which form is actually better, in some general sense, is target dependent (depending on whethr a target has various pre/post increment operations), but LSR won't create new PHIs, and that's another complication in this space.
Or more globally, this was the original pattern i was looking at: https://godbolt.org/z/Bzn5Lz
void sink(int row, int predRow);
void bad(int data, int len, int width) { for(int i = 1; i < len; i++) { int row = data + widthi; int predRow = data + width(i-1); sink(row, predRow); } } void good(int data, int len, int width) { int* predRow = data; for(int i = 1; i < len; i++, data+=width, predRow+=width) { sink(data, predRow); } }
In first case there are two gep's, in second there is an extra phi and only one gep.
Extended Description
https://godbolt.org/z/TR-s8A
include
include
void sink(int* row);
void v0(int data, int len, int width) { for(int i = 0; i < len; i++) { int row = data + widthi; sink(row); } } void v1(int row, int len, int width) { for(int i = 0; i < len; i++, row+=width) { sink(row); } }
In first case we get
9: ; preds = %9, %5 %10 = phi i64 [ 0, %5 ], [ %13, %9 ] %11 = mul nsw i64 %10, %6 %12 = getelementptr inbounds i32, i32 %0, i64 %11 tail call void @_Z4sinkPi(i32 %12) %13 = add nuw nsw i64 %10, 1
but in second:
%9 = phi i32 [ 0, %5 ], [ %11, %8 ] %10 = phi i32 [ %0, %5 ], [ %12, %8 ] tail call void @_Z4sinkPi(i32 %10) %11 = add nuw nsw i32 %9, 1 %12 = getelementptr inbounds i32, i32* %10, i64 %6
I'd guess the second variant is better? The difference is interesting.