llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.55k stars 11.8k forks source link

[RISCV] splat(uadd.sat) is not vectorized #100751

Open Fros1er opened 2 months ago

Fros1er commented 2 months ago

Vectorize version of #65072

define <vscale x 4 x i8> @nxv4i8(i8 %x, i8 %y) {
  %a = call i8 @llvm.uadd.sat.nxv4i8(i8 %x, i8 %y)
  %head.x = insertelement <vscale x 4 x i8> poison, i8 %a, i32 0
  %splat.x = shufflevector <vscale x 4 x i8> %head.x, <vscale x 4 x i8> poison, <vscale x 4 x i32> zeroinitializer
  ret <vscale x 4 x i8> %splat.x
}

With llc -mtriple=riscv64 -mattr=+v:

nxv4i8:                                 # @nxv4i8
        andi    a1, a1, 255
        andi    a0, a0, 255
        add     a0, a0, a1
        li      a1, 255
        bltu    a0, a1, .LBB0_2 # should use vsaddu.vx
        li      a0, 255
.LBB0_2:
        vsetvli a1, zero, e8, mf2, ta, ma
        vmv.v.x v8, a0
        ret

It can be vectorized to:

 nxv4i8:                                 # @nxv4i8
        vsetvli a2, zero, e8, mf2, ta, ma
        vmv.v.x v8, a0
        vsaddu.vx       v8, v8, a1
        ret

Btw, is it possible to use RVV to optimize the simple version returning i8?

 define i8 @nxv4i8(i8 %x, i8 %y) {
  %v = call i8 @llvm.uadd.sat.nxv4i8(i8 %x, i8 %y)
  ret i8 %v
}
llvmbot commented 2 months ago

@llvm/issue-subscribers-backend-risc-v

Author: Froster (Fros1er)

Vectorize version of #65072 ``` llvm define <vscale x 4 x i8> @nxv4i8(i8 %x, i8 %y) { %a = call i8 @llvm.uadd.sat.nxv4i8(i8 %x, i8 %y) %head.x = insertelement <vscale x 4 x i8> poison, i8 %a, i32 0 %splat.x = shufflevector <vscale x 4 x i8> %head.x, <vscale x 4 x i8> poison, <vscale x 4 x i32> zeroinitializer ret <vscale x 4 x i8> %splat.x } ``` With llc -mtriple=riscv64 -mattr=+v: ``` asm nxv4i8: # @nxv4i8 andi a1, a1, 255 andi a0, a0, 255 add a0, a0, a1 li a1, 255 bltu a0, a1, .LBB0_2 # should use vsaddu.vx li a0, 255 .LBB0_2: vsetvli a1, zero, e8, mf2, ta, ma vmv.v.x v8, a0 ret ``` It can be vectorized to: ``` asm nxv4i8: # @nxv4i8 vsetvli a2, zero, e8, mf2, ta, ma vmv.v.x v8, a0 vsaddu.vx v8, v8, a1 ret ``` Btw, is it possible to use RVV to optimize the simple version returning i8? ``` llvm define i8 @nxv4i8(i8 %x, i8 %y) { %v = call i8 @llvm.uadd.sat.nxv4i8(i8 %x, i8 %y) ret i8 %v } ```
topperc commented 2 months ago

Btw, is it possible to use RVV to optimize the simple version returning i8?

define i8 @nxv4i8(i8 %x, i8 %y) {
 %v = call i8 @llvm.uadd.sat.nxv4i8(i8 %x, i8 %y)
 ret i8 %v
}

It's possible, but I'm not sure it will be efficient on all CPUs.

Fros1er commented 2 months ago

May I self assign this?

topperc commented 2 months ago

May I self assign this?

Yes