Open Ag-Cu opened 5 months ago
Hi @Ag-Cu! Could you please provide an implementation for the idea you mentioned?
Sorry, I just saw it. Sure, here is my implementation using handwritten asm, and it works well:
.macro abs d0, s0, t0
vrsub.vi \t0, \s0, 0
vmax.vv \d0, \s0, \t0
.endm
I am not sure which performance is better, but it does cost fewer instructions.
I think it should be good. The only thing I may concern right now is what is the behavior of this implementation when overflow happened. In other words. For abs_s8
, what is the result of -128
is given?
I see your implementation for rvv, like this:
So why we don't just use two instructions: vrsub and vmax to implement abs?