llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.85k stars 11.47k forks source link

Failure to beneficially vectorize 'copyable' elements in integer binary ops #30135

Open RKSimon opened 7 years ago

RKSimon commented 7 years ago
Bugzilla Link 30787
Version trunk
OS Windows NT
Depends On llvm/llvm-project#34845 llvm/llvm-project#33967
Blocks llvm/llvm-project#15513
CC @alexey-bataev,@rotateright

Extended Description

We successfully vectorize:

// clang -O3 -march=btver2

void add0(int * __restrict dst, const int * __restrict src) {
  *dst++ = *src++ + 1;
  *dst++ = *src++ + 1;
  *dst++ = *src++ + 2;
  *dst++ = *src++ + 3;
}
add0(int*, int const*):
        vmovdqu xmm0, xmmword ptr [rsi]
        vpaddd  xmm0, xmm0, xmmword ptr [rip + .LCPI0_0]
        vmovdqu xmmword ptr [rdi], xmm0
        ret

But fail to do so if one or more elements simplify to a simple copy. If the cost model says it would still be beneficial we should vectorize such cases:

void add1(int * __restrict dst, const int * __restrict src) {
  *dst++ = *src++ + 0;
  *dst++ = *src++ + 1;
  *dst++ = *src++ + 2;
  *dst++ = *src++ + 3;
}
add1(int*, int const*):
        mov     ecx, dword ptr [rsi + 4]
        mov     eax, dword ptr [rsi]
        mov     edx, dword ptr [rsi + 8]
        inc     ecx
        mov     dword ptr [rdi], eax
        add     edx, 2
        mov     dword ptr [rdi + 4], ecx
        mov     ecx, dword ptr [rsi + 12]
        mov     dword ptr [rdi + 8], edx
        add     ecx, 3
        mov     dword ptr [rdi + 12], ecx
        ret

Similarly for SUB/MUL/SHL/LSHR/ASHR (DIV/REM?). Possibly for -ffast-math float FADD/FSUB/FMUL/FDIV operations as well.

Further examples can be found here: https://godbolt.org/g/ueIYiF

zmodem commented 2 years ago

mentioned in issue llvm/llvm-project#34845

JonPsson commented 2 years ago

mentioned in issue llvm/llvm-project#33967

RKSimon commented 7 years ago

Candidate Patch: https://reviews.llvm.org/D28907

RKSimon commented 7 years ago

assigned to @alexey-bataev