ucb-bar / hwacha

Microarchitecture implementation of the decoupled vector-fetch accelerator
http://hwacha.org/
148 stars 42 forks source link

Feature like SAXPY but with divide #36

Open mbelda opened 2 years ago

mbelda commented 2 years ago

Hello,

I am currently optimizing a code to run using Hwacha and I have this scenario.

for(int i = 0; i < n; i++){ out[i] = vec1[i]* const_float / vec2[i] + const_float2; }

So I know I can do the vector division on Hwacha, but it would be nice to have an operation to perform the multiplication by scalar and and operation like SAXPY but with divide. For example, SADPY so that I could do the following:

mul_scalar_vec_hwacha(out, vec1, const_float) sadpy(out, vec2, const_float2)

Is there any instruction on the ISA to perform this? I have been looking for it but I can't find any.

Thanks in advance!

a0u commented 2 years ago

Single-precision scalar-vector multiply:

vfmul.s.vs vv1, vv0, vs0

Most Hwacha vector compute instructions support using shared (scalar) registers for the rs1/rs2/rs3 operands. The .vs suffix shown above is optional; the assembler can select the correct variant based on the register names.

Page 15 of the Hwacha ISA manual explains the instruction encoding:

When the d flag at bit 63 is set, register rd is interpreted as a vector register (vd). When it is cleared, register rd is interpreted as a shared register (vs). Similarly, the s1 flag (bit 62), the s2 flag (bit 61), and the s3 flag (bit 60) indicates whether rs1, rs2, and rs3 refers to a vector register or a shared register respectively.

There are currently no instructions for fused floating-point divide and add.