riscv-non-isa / rvv-intrinsic-doc

https://jira.riscv.org/browse/RVG-153
BSD 3-Clause "New" or "Revised" License
290 stars 89 forks source link

Fractional LMUL #15

Closed ebahapo closed 2 years ago

ebahapo commented 4 years ago

We have made good progress, but I'm afraid that the release 0.9 of the V spec is coming down fast and methinks that the most radical change that it introduces is the new values of LMUL.

Please, share your thoughts about it here.

kito-cheng commented 4 years ago

Proposal for type system and API:

Vector Types for Fractional LMUL:

v{TYPE}{SEW}m{LMUL}_t

Changes:

Vector Tuple Types for Fractional LMUL:

v{TYPE}{SEW}m{LMUL}x{NF}_t

Changes:

Changes to Intrinsic API Naming Rules:

INTRINSIC ::= MNEMONIC '_' RET_TYPE
MNEMONIC ::= Instruction name in v-ext specification. Replace '.' with '_'.
RET_TYPE ::= SEW LMUL
SEW ::= ( i8 | i16 | i32 | i64 | u8 | u16 | u32 | u64 | f16 | f32 | f64 )
LMUL ::= ( mf8 | mf4 | mf2 | m1 | m2 | m4 | m8 )

Changes:

Issue for Fractional LMUL

rdolbeau commented 4 years ago

Seems OK to me; I like the idea of extending ELF for such requirements. Might be generally useful for extensions in general (i.e. have ELF attribute for V, some properties of V, but also B, ...).

Hsiangkai commented 4 years ago

It looks good to me.

David-Horner commented 4 years ago

@kito-cheng What is NF? it is not immediately apparent from

Fractional LMUL is not the only disruptive change.

LMUL no longer stripes vertically, SLEN determines a horizontal interleave instead. As a result

It is no longer the element order that gets shuffled, but only when LMUL>1. Instead even at LMUL=1 different Element length affects element content.

The element length and VLEN/SLEN determine the alignment structure. Thus if VLEN/SLEN > 1, component bytes of elements are no longer in in-memory order. Load MAXLV bytes into a register, then the half-words read from the register will have every other byte from memory in their upper and lower haves. Same type of story for word, none of those bytes will be from consecutive locations in memory.

The good and the bad of this is that most initial implementations are expected to have VLEN=SLEN. Those that do have SLEN<VLEN may well jump to SLEN=1/4 VLEN or 1/8th, as it is expected that only the higher performance larger VLEN will need to limit SLEN due to wiring issues. So, SLEN = 1/2 VLEN, which is a nice match for register pair processing (e.g. Complex numbers) is going to be rare.

But all the code needs to accommodate the in-register format not matching in-memory. There are suggestions on how to mitigate this in hardware. These intrinsics should be prepared to differentiate between in-memory-order agnostic and reliant structures/operations. The good news is that most operations are in-memory-order agnostic. e.g. all single width arithmetic. Even most mixed width operations are not going to care. But any sub-element component manipulation will need to be aware and careful.

Finally, I have noticed discussions about matching of masks under a given element length and LMUL with another element length or LMUL. Given that it is a definite concern and apparently at least moderately frequent in real code situations, you should know of a proposal for mask support that is ordinal based. Regardless of Element Length or LMUL the nth mask bit applies to the nth vector element. In all cases, a single bit is used to store the mask value. The issue is #448 in the riscv/riscv-v-spec github.

David-Horner commented 4 years ago

Well, not so finally apparently.

Another thing to mention:

Because LMUL no longer does vertical striping, but horizontal interleave, each physical register has the same characteristics. Physical registers are filled consecutively. This means the register grouping by powers of 2 is no longer a constraint. So, LMUL can take on all values between 1 and 8. This is good for intrinsics that can use a value of say 6, freeing up a register pair for two mask registers or a further m2 variable. A second vsetvl[i] instruction with a limiting AVL is necessary (currently), but as mentioned elsewhere in these comments it can be a low cost operation and the tradeoff is definitely worth it in some scenarios. (I also will be proposing an LMUL to 3,5,6 or 7 option based on ideas in riscv/riscv-v-spec github issue #418 , although that targeted the 0.8 structure.)

kito-cheng commented 4 years ago

@kito-cheng What is NF? it is not immediately apparent from

NF meaning NFIELDS, which is the term from segment load/store, vector tuple type are used for segment load store intrinsic API.

You can see this issue for more detail: https://github.com/sifive/rvv-intrinsic-doc/issues/11

eopXD commented 2 years ago

Fractional LMUL is now defined and implemented in RVV intrinsic. Closing this issue.