riscv / riscv-v-spec

Working draft of the proposed RISC-V V vector extension
https://jira.riscv.org/browse/RVG-122
Creative Commons Attribution 4.0 International
961 stars 273 forks source link

A simple Ordinal based mask encoding #435

Closed David-Horner closed 4 years ago

David-Horner commented 4 years ago

Background:

With Version 0.9-draft-20200424: All LMUL levels share the same (relative) element mapping within a physical register for a given SEW. Relative because the absolute element address varies for each physical register, but the allocation pattern is the same. In other word SEW determines the positions of elements within a physical register independent of any LMUL setting.

Thus LMUL>1 no longer provides internal format information, but only number of physical registers within a register group, and for LMUL<1 only provides limit of vl for use on single physical registers.

This is applicable to mask as well. LMUL as previously known is no more. See comment below for details.

There is a simple mapping from SEW ordinal to 2 * SEW ordinal that can retain the relationship between them.

Consider that the ordinal numbers for SEW width element in the first SLEN are 0,4,8,12. (SLEN=4SEW, with VLEN= 4SLEN) When widened, the vertical SLEN in the two registers have SLEN*2 elements with the same ordinals (0,4,8,12); 0 and 4 in vd;  8 and 12 in vd+1

To provide register group expansion of 8, all that is required is one byte (8 bits) in the mask register for each ordinal possible in the SLEN. Each of the bits will represent a specific physical register in the register group. E.g. bit 0 relative physical register 0, bit 1 relative physical register 1, etc. Each successive physical register advances the level 1 ordinal number by VLEN/SEW, so the higher bits represent the higher ordinal values.

Consider the SEW=8 case. In that SEW is 8 bits, then every byte in the mask register represents an ordinal offset (that assigned at physical register level 0) , and each bit represents the advance by VLEN/SEW for each additional physical register level.

When the data is SEW=16, the maximum number of ordinals that can exist in 8 physical registers is halved. Thus 1/2 of the masks can remain usable, the low four bits of every byte that represent the lower ordinal numbers. The upper half are unused.

Ditto for SEW=32 where 2 bits are active. At SEW=64 a single bit is active. At SEW=128, only 1/2 the bytes are active, those that represent the lower ordinals. Ditto for larger values of SEW until SEW = VLEN and only a single bit, bit zero in byte zero of SLEN group 0 is active.

Certainly, there are other ways to construct the mask mapping, but none of them need to be, nor should they be LMUL dependent.

David-Horner commented 4 years ago

LMUL previously had these two distinct purposes:

Notably, LMUL=1 had a flat data format (in effect there was no striping). This is analogous to SLEN=VLEN for the SEW dependent interleave. Incidentally, LMUL=1 had a register group size of 1. This was not a hard requirement, allowing 2 physical registers in the register group for LMUL=1 would allow widening by vertical striping to 4 physical registers. No problem, except for additional 1 bit state and slightly more circuitry. As best I can tell this possibility did not see any discussion on its  merits before it was (effectively) dismissed. And likely for OK reasons. "Do we need LMUL=1 multipliers when we have up to 8 at LMUL=8? It could serve dual purpose". LUML>1 is dual purpose, and it is partly the nature of vertical striping used to implement widening operations.

Once widening operations have changed the format of the stored data, you need to track that. Hence LMUL level 2. And further, vertical striping widening compounds, the format is further morphed and so LMUL level 4 needs to be tracked. And we allowed this one more time , and so LMUL=8 came into existence.

We can conceive of tertiary operations that take three sources and create a result triple the source sized (3 SEW source to SEW * 3 destination) The result with vertical striping would have a register set of 3 physical registers. But the TRiPLe level (LTRPL) would only be 1. If the operation we applied again using LTRPL=1 sources the register group size would be 9 but the level would be 2. We could confuse the concepts by redefining LTRPL to be encoded as the number of registers in the register group, 1,3 and 9 (and even 27). Same information conveyed, but more readable for programmers?

SEW dependent interleave does not compound. Widening is isomorphic (in the geological sense, not strictly mathematical). A double widened SEW operation's result looks like any other SEW * 2 registers contents. This is obvious when considers widening from the so called "LMUL=1/2" structure. The source is a (potentially) half filled physical register with each SLEN chunk 1/2 filled and the result is a (potentially) fully filled physical register.

Is there a mapping from SEW ordinal to 2 * SEW ordinal that can retain the mapping between them? Emphatically, yes. and it does not require any concept of compound levels. (LMUL no longer has relevance here either).

A note about:

When the data is SEW=16, the maximum number of ordinals that can exist in 8 physical registers is halved. Thus 1/2 of the masks can remain usable, the low four bits of every byte that represent the lower ordinal numbers. The upper half are unused. Note: The upper half are used if register groups greater than 8 are allowed,.

With the death of LMUL as structure level identifier and mask designater, it is now relegated to limiting vl by restricting available register groups of size 1/8, 1/4, 1/2,1,2,4 and 8 physical registers. And even that can be relaxed, see #418, removing the last vestage of the original LMUL.

kasanovic commented 4 years ago

First, LMUL isn't dead. See my comments on #418.

I can see that a new mask layout is being proposed here, but it is very hard to decipher.

I'd propose you try and use Heilmeier to structure your proposals: https://www.darpa.mil/work-with-us/heilmeier-catechism Specifically: What is the problem you're trying to solve, what's wrong with what's there now? What is the solution you're proposing? How is it better? How is it worse?

For LMUL/interleaving issues, I'd suggest using concrete examples with layouts as in spec to help others follow your meaning.

David-Horner commented 4 years ago

Q. What is the problem you're trying to solve. A. 1) Lack of understanding of SEW dependent mapping and its implications. Q. what's wrong with what's there now? A. 1) The implications of SEW dependent mapping have proven to non-obvious. It was not immediately apparent that “only SEW is important to remain the same to hide differences in striping parameter SLEN”. D42ffe2 . Instead it was assumed to be LMUL dependent. It was not immediately apparent that in-register and in-memory structures did not align: #434 “Should SLEN=VLEN be an extension?”. This was not a problem under LMUL striping. Data structure independence from LMUL is now understood. But we have not investigated the extent to which mask structure can be made independent of LMUL and whether an optimal encoding can be found.
Q. What is the solution you're proposing? A. Determine if mask mapping can be defined that does not explicitly use LMUL. An ordinal based mapping appears to be possible, therefore examine such an approach with the hopes of obtaining further insight and eliminating LMUL baggage. Q. How is it better? A. not known yet. Q. How is it worse? A. not known yet.

kasanovic commented 4 years ago

Could you please reopen when there is a concrete mask layout proposal.

David-Horner commented 4 years ago

Resolved through #448

Q. What is the problem? A. Ordinal mapping for all combinations of SEW and LMUL is absent. Ordinal mapping is a desirable trait. So much so that v0.8 (and 0.7 before it) advocated keeping LMUL to SEW ratio the same, in part so that it would retain ordinal mapping of mask values.

Q. What's wrong with what's there now? A. A change in LMUL will invalidate the ordinal mapping of masks for a given SEW register group. E.g. Changing SEW size by 1/2 will cause every second mask bit to be ignored. see** Whereas this may be a useful characteristic that could be exploited by software, the every other bit mapping varies by SLEN. see* Making it much less attractive, and the kind of feature previously discouraged in defining the RVV design. Granted, this mapping was quite functional under the previous vertical striped LMUL design, but is unnecessarily complicated under the SEW dependant interleave format. see**

see**

Ignoring every second bit is apparent from the definition of MLEN = SEW/LMUL (#443). Consider LMUL=1 and SEW=16. MLEN matches SEW of 2 bytes. Set all mask bits to one with vseteq v0,v0,v0. Now every two bytes has lowest bit set. Now set SEW=8 (using vsetvli). The mask at this SEW level now alternates between set and clear.

see***

However, the alternating element bytes are consecutive only if VLEN=SLEN, for VLEN=2 SLEN the ordinal numbers of adjacent bytes advance by 2, so every 4th element is set (by 8 for 4 SLEN, etc.) If we now also halve LMUL, we should be back to MLEN=2 * SEW, and we are.

see****

Note from the MLEN formula above, LMUL chunks of MLEN fit into a SEW chunk. The comparatively simple VLEN=SLEN mask address calculation is mask_LSB_index(i) = i * MLEN for the ordinal number i. (Note also for LMUL=1 that the defined SEW chunk housed all MLEN with the same SEW/LMUL ratio. This is the base configuration for SEW elements. )

It is apparent that consecutive ordinals are mapped into consecutive MLEN chunks. This previously had physical meaning as consecutive vertical striped elements were in successive registers. MLEN set (LMUL of them) contained within the base SEW element designated what physical register the corresponding element was in. That physical meaning is no longer relevant.

The mapping to physical registers of MLEN chunks aligned to the base configuration SEW(i when LMUL=1) is dependent upon its ordinal within the base SEW, i, SLEN, VLEN, and two of MLEN, SEW and LMUL.

Q. What is the solution you're proposing? A. A simple Ordinal based mask encoding

Q. How is it better? A. It is much simpler. It retains ordinal mapping of mask values for all the values last set that are addressable by the new LMUL and SEW, even when the SEW/LMUL ratio changes.

Q. How is it worse? A. It is a potentially substantive mind shift.

I believe the original description conveys all the relevant characteristics, however, I am working on the suggested illustrations.

Hopefully in the light of the answers to the suggested questions, the original description will be less obtuse.