riscvarchive / riscv-v-spec

Working draft of the proposed RISC-V V vector extension
https://jira.riscv.org/browse/RVG-122
Creative Commons Attribution 4.0 International
974 stars 272 forks source link

mask op scalar destination update when vl=0 #587

Closed David-Horner closed 3 years ago

David-Horner commented 4 years ago

For reduction instructions:

The standard activity for vl=0 is: no operation is performed and the destination register is not updated.

However The vfirst instruction finds the lowest-numbered active element of the source mask vector that has the value 1 and writes that element’s index to a GPR. If no active element has the value 1, -1 is written to the GPR.

My take is if vl=0 then -1 is written to the GPR.

For vpopc my take is if vl=0 then 0 is written to the GPR.

Frequently the values from these mask ops is used for loop and flow control. Leaving them unchanged is thus dangerous as previous state has no relevance to this iteration of the loop.

Whereas this may be obvious, specifying the behavour when vl=0 will avoid confusion.

aswaterman commented 4 years ago

This already follows from the spec, because if vl=0, then no active elements have the value 1, coupled with the fact that instructions that write a scalar integer or floating-point register do so even when vstart ≥ vl.

A non-normative note might be appropriate.

David-Horner commented 4 years ago

On 2020-10-21 6:33 p.m., swallach wrote:

i have not totally been following this discussion. but at convex we handled this very simply

if Vl = 0, no vector operation was executed, and the vector instruction was executed and sequential operation proceeded.

to the best of my knowledge this never came up as an issue

https://github.com/riscv/riscv-v-spec/issues/587#issuecomment-711087236

To clarify, Andrew's reading of the spec has vstart>= vl behaviour superseding vl=0 implied behaviour.

Thus some vector instructions are executed even when vl=0. vfirst and vpopc are two of them.

billhuffman commented 4 years ago

Andrew's comment doesn't seem to me to supersede vl=0. It just says that the scalar write, like all other scalar writes, is done even when vl=0, and even when vstart>=vl. Is that, in some way, superseding vl=0?

David-Horner commented 4 years ago

@David-Horner .... superseding vl=0 implied behaviour. @billhuffman Andrew's comment doesn't seem to me to supersede vl=0... Is that, in some way, superseding vl=0?

yes and no. Yes, as in "superseding vl=0 implied behaviour." No, vl=0 itself is not changed, and it's stated behaviour is not changed; just the stand alone implied behaviour of the spec.

from the spec section 3.3:

As a consequence, when vl=0, no elements are updated in the destination vector register group, regardless of vstart. Instructions that write a scalar integer or floating-point register do so even when vstart ≥ vl.

The vl=0 mandated behaviour [regardless of vstart] effectively means that [in most cases] the vector instruction can be treated as a nop. The vstart>=vl mandate makes it clear that there may indeed be an exception for that vl=0 nop behaviour. Thus Andrew's reading is that vstart >= vl supersedes the vl=0 implied behaviour that vector instructions can be treated as nops.

Andrew suggests a non-normative note might be appropriate. I agree. But perhaps the normative text also needs to be fixed.

We have "... when vl=0 ... regardless of vstart." and " ... when vstart ≥ vl."

I think it is partly because both clauses state the behaviour in the presence of conditions of both vl and vstart that this interaction of behaviours is not immediately clear to everyone. vl = 0 is unusual. vstart>=vl is also unusual.

There is obviously some points of confusion, if not contention, possible with the text as is.

A further implication is that a performance counter of executed vector instructions would also count those instructions processed when vl=0. However, subset counts when vl>0 and when vstart>vl might also be defined and seen as beneficial