riscv / riscv-v-spec

Working draft of the proposed RISC-V V vector extension
https://jira.riscv.org/browse/RVG-122
Creative Commons Attribution 4.0 International
953 stars 271 forks source link

Shall we have a prefix sum instruction? #353

Open HanKuanChen opened 4 years ago

HanKuanChen commented 4 years ago

A prefix sum is

void prefix_sum(int32_t *src, size_t size, int32_t *des);

des[0] = src[0]
des[1] = src[0] + src[1]
des[2] = src[0] + src[1] + src[2]
...
des[size - 1] = src[0] + src[1] + src[2] + ... + src[size - 1]

For scalar code, it is

    mv sum, x0
    slli size, size, 2
    add src_end, src, size
    beq src, src_end, end
loop:
    lw input, 0(src)
    add sum, sum, input
    sw sum, 0(des)
    addi src, src, 4
    addi des, des, 4
    bne src, src_end, loop
end:
    ret

However, if we have vpresum.vs vd, vs2, vs1, vm # vd[i] = sum(vs1[0], vs2[0~i]), we could have

    beqz size, end
    vsetvli x0, size, e32, m8
    vmv.s.x vsum, x0
loop:
    vsetvli current_vl, size, e32, m8
    vlw.v vsrc, (src)
    vpresum.vs vdes, vsrc, vsum
    vsw.v vdes, (des)
    sub size, size, current_vl
    slli current_vl, current_vl, 2
    add src, src, current_vl
    add des, des, current_vl
    lw pre_sum, -4(des)
    vmv.s.x vsum, pre_sum
    bnez size, begin
end:
    ret
nick-knight commented 4 years ago

@HanKuanChen Two thoughts: it would be interesting to compare with a vectorized prefix_sum using the current spec (i.e., without vpresum.vs). Demonstrating this complexity might help strengthen the argument for inclusion of this instruction. Additionally, you might also consider explaining how vpresum.vs could improve the performance of real-world applications.