llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.39k stars 12.15k forks source link

Constant folding for trivial loop does not occur for float and double #31278

Open dc81c6b5-3a5b-438e-b826-9e7edb3cf487 opened 7 years ago

dc81c6b5-3a5b-438e-b826-9e7edb3cf487 commented 7 years ago
Bugzilla Link 31930
Version trunk
OS Linux
CC @lesshaste,@hfinkel

Extended Description

Consider:

float f(float x[]) { float p = 1.0; for (int i = 0; i < 960; i++) p += 1; return p; }

When compiled with -march=core-avx2 -O3 -ffast-math the assembly loops round adding until it gets to 961.

However:

int f(int x[]) { int p = 1; for (int i = 0; i < 960; i++) p += 1; return p; }

gives:

f: # @​f mov eax, 961 ret

I don't know how hard it would be to add the same optimization for floats and double.

As a side note, there are in fact a number of interesting details with the first (float) loop. First, if we reduce the i < 960 limit to i < 959 the loop is optimized out. Second if we change the type to 'double' this upper limit goes down to i < 479. My guess is that this corresponds to an unpeeling cost model that is incorporated into the compiler.

dc81c6b5-3a5b-438e-b826-9e7edb3cf487 commented 7 years ago

If we take:

float f(int x[]) { float p = 1;

pragma unroll

for (int i = 0; i < 960; i++) p += 1; return p; }

and compile simply with -O and no other flags, we get:

.LCPI0_0: .long 1148207104 # float 961 f: # @​f movss xmm0, dword ptr [rip + .LCPI0_0] # xmm0 = mem[0],zero,zero,zero ret