riscv / riscv-p-spec

RISC-V Packed SIMD Extension
https://jira.riscv.org/browse/RVG-129
Creative Commons Attribution 4.0 International
141 stars 38 forks source link

non destructive ternary shuffling #144

Open jnk0le opened 2 years ago

jnk0le commented 2 years ago

In addition to xperm8 and xperm4 from zbkx.

Proposal is about adding ternary shuffles, similar to PULP pv.shuffle2.{b,h} instruction. But with the "byte/reg select" bits in its own lane for easier data dependent shuffling and symmetry with xperm8. (the downside being that the mask can no longer be loaded with single addi on RV32 (loop invariant anyway))

It should also zero out the result if the index is out of bounds: i.e. any bit set above the reg select bit (similar to xperm8)

obraz

Those were proved to have an significant use in convolutions and matmul, but in some cases suffer from extra moves due to its destructive form (e.g. listing 1 in [1], table 5.9 in [2]).

Because of this I think that needs to be an R4 type ternary instr similar to cmix (aka BPICK), or funnel shifts. Otherwise there needs to be 2 variants of that same isntruction (doesn't solve issues in [1] and [2] though):

[1] - https://arxiv.org/pdf/2004.11690.pdf [2] - https://webthesis.biblio.polito.it/18144/1/tesi.pdf

jnk0le commented 2 years ago

shuffles in ri5cy datasheet, have masks aligned to lanes. Fig 6 (from https://arxiv.org/pdf/1608.08376.pdf) seems to be wrong. obraz