Feature like SAXPY but with divide

ucb-bar / hwacha

Microarchitecture implementation of the decoupled vector-fetch accelerator

148 stars 42 forks source link

Single-precision scalar-vector multiply:

vfmul.s.vs vv1, vv0, vs0

Most Hwacha vector compute instructions support using shared (scalar) registers for the rs1/rs2/rs3 operands. The .vs suffix shown above is optional; the assembler can select the correct variant based on the register names.

Page 15 of the Hwacha ISA manual explains the instruction encoding:

When the d flag at bit 63 is set, register rd is interpreted as a vector register (vd). When it is cleared, register rd is interpreted as a shared register (vs). Similarly, the s1 flag (bit 62), the s2 flag (bit 61), and the s3 flag (bit 60) indicates whether rs1, rs2, and rs3 refers to a vector register or a shared register respectively.

There are currently no instructions for fused floating-point divide and add.

ucb-bar / hwacha

Feature like SAXPY but with divide #36