llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.38k stars 12.15k forks source link

Differences between `base+width*i` and `base+=width` #43180

Open LebedevRI opened 5 years ago

LebedevRI commented 5 years ago
Bugzilla Link 43835
Version trunk
OS Linux
CC @efriedma-quic,@fhahn,@hfinkel,@jdoerfert,@preames

Extended Description

https://godbolt.org/z/TR-s8A

include

include

void sink(int* row);

void v0(int data, int len, int width) { for(int i = 0; i < len; i++) { int row = data + widthi; sink(row); } } void v1(int row, int len, int width) { for(int i = 0; i < len; i++, row+=width) { sink(row); } }

In first case we get

9: ; preds = %9, %5 %10 = phi i64 [ 0, %5 ], [ %13, %9 ] %11 = mul nsw i64 %10, %6 %12 = getelementptr inbounds i32, i32 %0, i64 %11 tail call void @​_Z4sinkPi(i32 %12) %13 = add nuw nsw i64 %10, 1

but in second:

%9 = phi i32 [ 0, %5 ], [ %11, %8 ] %10 = phi i32 [ %0, %5 ], [ %12, %8 ] tail call void @​_Z4sinkPi(i32 %10) %11 = add nuw nsw i32 %9, 1 %12 = getelementptr inbounds i32, i32* %10, i64 %6

I'd guess the second variant is better? The difference is interesting.

LebedevRI commented 3 years ago

mentioned in issue llvm/llvm-bugzilla-archive#43836

preames commented 3 years ago

Unfortunately, I agree with Eli that this is going to be delicate and very complicated to get tuned just right. The first step will be tuning LSR to ensure that the (proposed) canonical form gets turned into the optimal sequence on each target reliably. LSR is fragile and annoying, so that's a significant amount of work and risk to take on.

Unless you are strongly motivated by this case, I'd honestly advise against it.

efriedma-quic commented 3 years ago

Feel free to try, but if you seriously pursue this, I suspect you're going to end up spending a lot of time looking at LSR...

Please make sure you take some time to look at the history of disable-iv-rewrite.

LebedevRI commented 3 years ago

The reason i'm revisiting this is https://reviews.llvm.org/D107935 I'm basically thinking about having a pass to rewrite all pointer math so that:

  1. loop-invariant part of address computation should be done in preheader
  2. canonicalize to index induction

There already exists SeparateConstOffsetFromGEP pass, and it seems to be run by a few backends already. Not sure if we can extend & use it.

efriedma-quic commented 3 years ago

From an analysis standpoint, it doesn't really matter much; the SCEV expressions are equivalent either way. And there aren't really any interesting optimizations if we canonicalize early, I think. IndVarSimplify used to canonicalize aggressively, but we turned that off a while back because it was messing up optimized loops without any obvious benefit.

If LSR can't treat the two forms as equivalent, that's obviously a problem. Not sure why that's happening here, though, at first glance.

LebedevRI commented 3 years ago

Surely sticking with the current status-quo leaves the performance on the floor for the code written in the opposite style from the one optimal for given $target.

I would really guess that the integer induction should be canonical? For which targets would pointer induction be better?

hfinkel commented 5 years ago

Interesting. I suppose that IndVarSimplify doesn't catch this. Which form is actually better, in some general sense, is target dependent (depending on whethr a target has various pre/post increment operations), but LSR won't create new PHIs, and that's another complication in this space.

LebedevRI commented 5 years ago

Or more globally, this was the original pattern i was looking at: https://godbolt.org/z/Bzn5Lz

include

include

void sink(int row, int predRow);

void bad(int data, int len, int width) { for(int i = 1; i < len; i++) { int row = data + widthi; int predRow = data + width(i-1); sink(row, predRow); } } void good(int data, int len, int width) { int* predRow = data; for(int i = 1; i < len; i++, data+=width, predRow+=width) { sink(data, predRow); } }

In first case there are two gep's, in second there is an extra phi and only one gep.