llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.15k stars 12.03k forks source link

[Flang] TSVC s115: compiler doesn't vectorize the loop considering an initial value of do-variable might overflow #110609

Open yus3710-fj opened 1 month ago

yus3710-fj commented 1 month ago

Flang can't vectorize the loop in s115 of TSVC while Clang can vectorize the loop written in C.

int s115() { init( "s115 "); for (int j = 0; j < LEN2; j++) { for (int i = j+1; i < LEN2; i++) { a[i] -= aa[j][i] * a[j]; } } dummy(a, b, c, d, e, aa, bb, cc, 0.); return 0; }

```console
$ clang -O3 s115.c -S -Rpass=vector
s115.c:10:4: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]
   10 |                         for (int i = j+1; i < LEN2; i++) {
      |                         ^

If j+1 overflow, the access to a(i) and a(j) may overlap so vectorization is prevented. IIRC, compilers don't have to consider it.

llvmbot commented 1 month ago

@llvm/issue-subscribers-flang-ir

Author: Yusuke MINATO (yus3710-fj)

Flang can't vectorize the loop in `s115` of [TSVC](https://www.netlib.org/benchmark/vectors) while Clang can vectorize the loop written in C. * Fortran ```fortran ! Fortran version subroutine s115 (ntimes,ld,n,ctime,dtime,a,b,c,d,e,aa,bb,cc) integer ntimes, ld, n, i, nl, j real a(n), b(n), c(n), d(n), e(n), aa(ld,n), bb(ld,n), cc(ld,n) call init(ld,n,a,b,c,d,e,aa,bb,cc,'s115 ') do 10 j = 1,n do 20 i = j+1, n a(i) = a(i) - aa(i,j) * a(j) 20 continue 10 continue call dummy(ld,n,a,b,c,d,e,aa,bb,cc,1.) end ``` ```console $ flang-new -v -O3 -flang-experimental-integer-overflow s115.f -S -Rpass=vector flang-new version 20.0.0git (https://github.com/llvm/llvm-project.git 2c770675ce36402b51a320ae26f369690c138dc1) Target: aarch64-unknown-linux-gnu Thread model: posix InstalledDir: /path/to/build/bin Build config: +assertions Found candidate GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11 Selected GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11 Candidate multilib: .;@m64 Selected multilib: .;@m64 "/path/to/build/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -target-cpu generic -target-feature +outline-atomics -target-feature +v8a -target-feature +fp-armv8 -target-feature +neon -fversion-loops-for-stride -flang-experimental-integer-overflow -Rpass=vector -resource-dir /path/to/build/lib/clang/20 -mframe-pointer=non-leaf -O3 -o /dev/null -x f95-cpp-input s115.f ``` * C ```c // C version #define LEN 32000 #define LEN2 256 float a[LEN], b[LEN], c[LEN], d[LEN], e[LEN]; float aa[LEN2][LEN2], bb[LEN2][LEN2], cc[LEN2][LEN2]; int s115() { init( "s115 "); for (int j = 0; j < LEN2; j++) { for (int i = j+1; i < LEN2; i++) { a[i] -= aa[j][i] * a[j]; } } dummy(a, b, c, d, e, aa, bb, cc, 0.); return 0; } ``` ```console $ clang -O3 s115.c -S -Rpass=vector s115.c:10:4: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize] 10 | for (int i = j+1; i < LEN2; i++) { | ^ ``` If `j+1` overflow, the access to `a(i)` and `a(j)` may overlap so vectorization is prevented. IIRC, compilers don't have to consider it.