Open HanKuanChen opened 4 years ago
@HanKuanChen Two thoughts: it would be interesting to compare with a vectorized prefix_sum
using the current spec (i.e., without vpresum.vs
). Demonstrating this complexity might help strengthen the argument for inclusion of this instruction. Additionally, you might also consider explaining how vpresum.vs
could improve the performance of real-world applications.
A prefix sum is
For scalar code, it is
However, if we have
vpresum.vs vd, vs2, vs1, vm # vd[i] = sum(vs1[0], vs2[0~i])
, we could have