howjmay / neon2rvv

A translator from ARM NEON intrinsics to RISCV-V Extension implementation
MIT License
24 stars 7 forks source link

abs implementation in rvv #399

Open Ag-Cu opened 5 months ago

Ag-Cu commented 5 months ago

I see your implementation for rvv, like this:

vabs_s16:                               # @vabs_s16
        vsetivli        zero, 4, e16, m1, ta, ma
        vsra.vi v9, v8, 15
        vxor.vv v8, v8, v9
        vsub.vv v8, v8, v9
        ret

So why we don't just use two instructions: vrsub and vmax to implement abs?

howjmay commented 5 months ago

Hi @Ag-Cu! Could you please provide an implementation for the idea you mentioned?

Ag-Cu commented 5 months ago

Sorry, I just saw it. Sure, here is my implementation using handwritten asm, and it works well:

.macro abs d0, s0, t0
    vrsub.vi    \t0, \s0, 0
    vmax.vv     \d0, \s0, \t0
.endm

I am not sure which performance is better, but it does cost fewer instructions.

howjmay commented 5 months ago

I think it should be good. The only thing I may concern right now is what is the behavior of this implementation when overflow happened. In other words. For abs_s8, what is the result of -128 is given?