llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.69k stars 11.87k forks source link

[SLPVectorizer] clang failed vectorize the loop in the form of mixed sub/add #64982

Open vfdff opened 1 year ago

vfdff commented 1 year ago
vfdff commented 1 year ago
vfdff commented 9 months ago
llvmbot commented 9 months ago

@llvm/issue-subscribers-backend-aarch64

Author: Allen (vfdff)

* test: https://godbolt.org/z/11TbEx119 ``` void sub4x4_dct(int16_t d[16], int16_t dct[16], uint8_t *pix1, uint8_t *pix2 ) { int16_t tmp[16]; for( int i = 0; i < 4; i++ ) { int s03 = d[i*4+0] + d[i*4+3]; int s12 = d[i*4+1] + d[i*4+2]; int d03 = d[i*4+0] - d[i*4+3]; int d12 = d[i*4+1] - d[i*4+2]; tmp[0*4+i] = s03 + s12; tmp[1*4+i] = 2*d03 + d12; tmp[2*4+i] = s03 - s12; tmp[3*4+i] = d03 - 2*d12; } for( int i = 0; i < 4; i++ ) { int s03 = tmp[i*4+0] + tmp[i*4+3]; int s12 = tmp[i*4+1] + tmp[i*4+2]; int d03 = tmp[i*4+0] - tmp[i*4+3]; int d12 = tmp[i*4+1] - tmp[i*4+2]; dct[i*4+0] = s03 + s12; dct[i*4+1] = 2*d03 + d12; dct[i*4+2] = s03 - s12; dct[i*4+3] = d03 - 2*d12; } } ```
vfdff commented 9 months ago

it seems a cost model issue(commit d827865e9). It generate SLP when we increase the cost for fadd(Now x86 set cost 2 for double fadd)

+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2900,7 +2900,7 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
         (Ty->getScalarType()->isBFloatTy() && !ST->hasBF16()))
       return 2 * LT.first;
     if (!Ty->getScalarType()->isFP128Ty())
-      return LT.first;
+      return 2 * LT.first;
vfdff commented 9 months ago

or add -aarch64-insert-extract-base-cost=1 for arm:https://godbolt.org/z/eras8WG91

vfdff commented 4 months ago

gcc gets new improvement idea record on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138.