Closed hanna-kruppe closed 5 years ago
There is a further problem with leaving vsew at 8 bits... you only have 8 named (quad) registers of size 32b. So, the main motivation for increasing vsew would be to get access to more registers... eg, setting vsew to 16b gives access to 16 named (double) registers of size 32b.
The way the RISC-V spec handles different data sizes is actually quite clunky and encumbering for both software and hardware. The software ABI has a problem because, as you say, it is unclear how vsew will be managed across function calls.
A much cleaner way is to make explicit data layout options and restrictions on the hardware, and to encode operand sizes for each instruction as part of the ISA encoding rather than as a fixed mode. This deals away with all ABI problems (except for managing register names across function calls, but that's a different issue).
Here's how such an idea would work: there is no vsew. Each operand uses a 2b encoding to represent the operand size: 00=8b, 01=16b, 10=32b, 11=64b. This drops support for quad, but I believe only floating-point will use that data size (or RV128, which hasn't been defined).
The ISA can allow a maximum of, say, 16 named registers V0 to V15. Why 16 and not 32? Personally I have seldom seen the need for 32 named registers (not saying it's not important, just that I don't think I've used that many in my vector programming), and this saves us 3b in the entire instruction encoding (1b each for dst, src1, src2) which will be used up by the 2b operand encoding. Thus, the net cost is 3b.
We can still have a mode which reduces the number of named registers to 8 or 4, thus allowing longer maxvl. The same scheme could be used as currently, where dropping to 8 registers uses register names v0, v2, v4, etc and dropping to 4 registers uses v0, v4, etc.
Finally, we specify how data is to be read out when originally written as 8b data is accessed as another size. Using a logical packed data format is very straight-forward.
This keeps the software and hardware both very simple.
Finally, we need VADDU and VSUBU instructions. When src operand sizes are smaller than the destination, the smaller element sizes must be extended using zero-extension or sign-extension (VADDU, VADD). Perhaps some other instructions also need this signed-type information?
The proposals for getting moving data widths from register configuration into the opcodes, and all that entails, are already being discussed elsewhere, so let's keep this issue focused on how to manage vsew under the assumption that we keep it.
@rkruppe where is this being discussed?
I believe this is part of the same discussion -- the decision of moving operand sizes into opcodes would be partly motivated by how simple/hard it is to solve this problem you are presenting. There should at least be a cross-reference between the discussions.
I am referring to the thread on the task group mailing list started by Ken Dockser. It's unfortunate that said mailing list isn't available to the public but if & when that discussion moves to this repo it should get its own issue (or several).
I believe this is part of the same discussion -- the decision of moving operand sizes into opcodes would be partly motivated by how simple/hard it is to solve this problem you are presenting. There should at least be a cross-reference between the discussions.
Fair, but I disagree about the amount of overlap/synergy. There are a large number of acceptable options here that do not involve redesigning the ISA to the degree you propose. To give just a few examples off the top of my head:
I agree that there are several possible solutions you list above. None of these solutions are needed if we follow the change I am proposing. I am not sure that the propsal by Ken is exactly in line with what I am proposing -- he is actually proposing more extensive changes (not just encoding of element size into the instruction encoding) but exactly what he's proposing still isn't clear.
My proposal is very succinct. We could start a new issue to discuss it if you want. The title of this issue is "how to set vsew, if at all", and my proposed response is "not at all, get rid of it". I'm sorry if I'm hijacking you thread by taking this beyond your intended scope, but my proposal is one valid solution. Others may wish to discuss the various solutions you are proposing as well.
From what I see, changing vsew realigns all of the named vector registers. Essentially, each named vector register is a pointer into an address space of the vector data store; changing vsew reassigns all of these address pointers. This is a difficult thing to implement in hardware, as it requires a few shifters to adjust the position of the address bits (and insert 0s into the LSBs); these shifters need to support shifting by 0(byte) to 4(quad) to 6(quadword register, vsew=doubleword) bit positions; the amount of the shift is determined by vsew and whether the register is configured to be a single, double or quad. This logic is fairly complex (but fortunately it is static for a given vconfig state). Because of differences in possible implementations, the current spec also requires zeroing all data elements because these pointers get reassigned, making it difficult for software to understand the data layout.
What I'm proposing may involve more work in redesigning the ISA, but that would only be "done once" and result in a lower cost to all implementations. Each named vector register points to the same fixed address in the vector data store. The only thing that changes is how element data is interpreted within that vector data store; is it treated as bytes, halfwords, words, or doublewords. It loses the ability to access 128b and 256b elements, unless there is a demand to add that somehow. Even when the number of named vector registers are halved (and maxvl is extended), it merely removes references to intermediate access points (though the odd numbered vector registers), so the named vector register pointers still all stay fixed. Since the data layout changes are readily transparent to software, there is no need to zero the content when the access element size changes.
Indeed Ken's proposal seems like a more comprehensive overhaul, but I think it has a lot in common with your proposal: once the major step of removing vsew and measuring vector register in bits rather than elements is done, things like combining adjacent registers into larger ones seem relatively minor.
My proposal is very succinct. We could start a new issue to discuss it if you want. The title of this issue is "how to set vsew, if at all", and my proposed response is "not at all, get rid of it". I'm sorry if I'm hijacking you thread by taking this beyond your intended scope, but my proposal is one valid solution. Others may wish to discuss the various solutions you are proposing as well.
It's true that it would resolve this issue. The reason I want to discuss it separately (though please do cross-reference if you open a new issue for it) is a procedural one: removing vsew entirely is a big change that generates lots of discussion points, if that happens in the same thread as bikeshedding over how we could make vconfig more useful in a vsew world, the former discussion will likely overshadow the latter. That would be fine if there already was consensus that ditching vsew is the right solution to this and other problems, but I don't see that being the case.
Ok, I've opened issue #53. All discussion on removing vsew should move there. I'll email Ken and hpoefully he can participate on github as well.
Move to #53 so closing this one.
[Pulling out a tangent from #19 to discuss it here]
In the current proposal vconfig doesn't have spare bits for vemaxw, so you either have to set it separately with a CSR instruction or arrange such that it's already at a known value and work with that. As @colinschmidt said about the latter option,
However, there's a tension here: smaller vsew means having to use up multiple registers and using an otherwise rather unnatural configuration, while larger vsew of course penalizes code handling narrower elements. There's also the question of how vsew is kept at e.g. 8 bit -- would this part of the ABI? Or is it only within a single function (meaning it only helps if you change vcfg multiple times in a function)?
Of course, there's always the option of not using vconfig and doing a load-immediate and writing the vcfg CSR normally. However, since vregcfg is already 12 bits on its own and vsew bumps that up to 15 bits total (not even accounting for vtypeen), this will often require three instructions.