[Flang] TSVC s115: compiler doesn't vectorize the loop considering an initial value of do-variable might overflow

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

Other

29.15k stars 12.03k forks source link

Flang can't vectorize the loop in s115 of TSVC while Clang can vectorize the loop written in C.

Fortran

!     Fortran version
  subroutine s115 (ntimes,ld,n,ctime,dtime,a,b,c,d,e,aa,bb,cc)

  integer ntimes, ld, n, i, nl, j
  real a(n), b(n), c(n), d(n), e(n), aa(ld,n), bb(ld,n), cc(ld,n)

  call init(ld,n,a,b,c,d,e,aa,bb,cc,'s115 ')
  do 10 j = 1,n
     do 20 i = j+1, n
        a(i) = a(i) - aa(i,j) * a(j)
20     continue
10  continue
  call dummy(ld,n,a,b,c,d,e,aa,bb,cc,1.)
  end

$ flang-new -v -O3 -flang-experimental-integer-overflow s115.f -S -Rpass=vector
flang-new version 20.0.0git (https://github.com/llvm/llvm-project.git 2c770675ce36402b51a320ae26f369690c138dc1)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/build/bin
Build config: +assertions
Found candidate GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Selected GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Candidate multilib: .;@m64
Selected multilib: .;@m64
"/path/to/build/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -target-cpu generic -target-feature +outline-atomics -target-feature +v8a -target-feature +fp-armv8 -target-feature +neon -fversion-loops-for-stride -flang-experimental-integer-overflow -Rpass=vector -resource-dir /path/to/build/lib/clang/20 -mframe-pointer=non-leaf -O3 -o /dev/null -x f95-cpp-input s115.f


// C version
#define LEN 32000
#define LEN2 256
float a[LEN], b[LEN], c[LEN], d[LEN], e[LEN];
float aa[LEN2][LEN2], bb[LEN2][LEN2], cc[LEN2][LEN2];

int s115() { init( "s115 "); for (int j = 0; j < LEN2; j++) { for (int i = j+1; i < LEN2; i++) { a[i] -= aa[j][i] * a[j]; } } dummy(a, b, c, d, e, aa, bb, cc, 0.); return 0; }

```console
$ clang -O3 s115.c -S -Rpass=vector
s115.c:10:4: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]
   10 |                         for (int i = j+1; i < LEN2; i++) {
      |                         ^

If j+1 overflow, the access to a(i) and a(j) may overlap so vectorization is prevented. IIRC, compilers don't have to consider it.

@llvm/issue-subscribers-flang-ir

Author: Yusuke MINATO (yus3710-fj)

Flang can't vectorize the loop in `s115` of [TSVC](https://www.netlib.org/benchmark/vectors) while Clang can vectorize the loop written in C. * Fortran ```fortran ! Fortran version subroutine s115 (ntimes,ld,n,ctime,dtime,a,b,c,d,e,aa,bb,cc) integer ntimes, ld, n, i, nl, j real a(n), b(n), c(n), d(n), e(n), aa(ld,n), bb(ld,n), cc(ld,n) call init(ld,n,a,b,c,d,e,aa,bb,cc,'s115 ') do 10 j = 1,n do 20 i = j+1, n a(i) = a(i) - aa(i,j) * a(j) 20 continue 10 continue call dummy(ld,n,a,b,c,d,e,aa,bb,cc,1.) end ``` ```console $ flang-new -v -O3 -flang-experimental-integer-overflow s115.f -S -Rpass=vector flang-new version 20.0.0git (https://github.com/llvm/llvm-project.git 2c770675ce36402b51a320ae26f369690c138dc1) Target: aarch64-unknown-linux-gnu Thread model: posix InstalledDir: /path/to/build/bin Build config: +assertions Found candidate GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11 Selected GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11 Candidate multilib: .;@m64 Selected multilib: .;@m64 "/path/to/build/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -target-cpu generic -target-feature +outline-atomics -target-feature +v8a -target-feature +fp-armv8 -target-feature +neon -fversion-loops-for-stride -flang-experimental-integer-overflow -Rpass=vector -resource-dir /path/to/build/lib/clang/20 -mframe-pointer=non-leaf -O3 -o /dev/null -x f95-cpp-input s115.f ``` * C ```c // C version #define LEN 32000 #define LEN2 256 float a[LEN], b[LEN], c[LEN], d[LEN], e[LEN]; float aa[LEN2][LEN2], bb[LEN2][LEN2], cc[LEN2][LEN2]; int s115() { init( "s115 "); for (int j = 0; j < LEN2; j++) { for (int i = j+1; i < LEN2; i++) { a[i] -= aa[j][i] * a[j]; } } dummy(a, b, c, d, e, aa, bb, cc, 0.); return 0; } ``` ```console $ clang -O3 s115.c -S -Rpass=vector s115.c:10:4: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize] 10 | for (int i = j+1; i < LEN2; i++) { | ^ ``` If `j+1` overflow, the access to `a(i)` and `a(j)` may overlap so vectorization is prevented. IIRC, compilers don't have to consider it.

llvm / llvm-project

[Flang] TSVC s115: compiler doesn't vectorize the loop considering an initial value of do-variable might overflow #110609