Open kdockser opened 2 years ago
The original text is being literal when it says "which of the first vl elements of vector register group vs2 should be extracted and packed": at most vl elements will be copied, and possibly fewer. (I'm not opposed to rewriting this sentence to be clearer, of course, but I don't agree that it's ambiguous.)
According to section 3.5 Vector length Register vl
The vl register holds an unsigned integer specifying the number of elements to be updated with results from a vector instruction
Are you saying that vcompress doesn't follow this rule and in this case vl applies to how many of the sources are being evaluated to have TRUE masks?
Yes. The vcompress definition slyly indicates this is overridden by indicating that the set of tail elements is different for vcompress than for other instructions ("The remaining elements of vd are treated as tail elements").
So, for example, if vl=5 and the first 5 masks are all F and the remainder are all T, everything is treated as being part of the Tail? If so, that doesn't seem very useful when the most likely use case is compressing the elements with the first 5 T masks.
That's a correct interpretation of the spec.
If the application vector length is actually 5, then pulling from elements 5..VLMAX-1 is not actually terribly useful, since they'll just contain garbage.
You can more or less obtain the operation you described by setting vl to max, then ignoring destination elements past element 4.
According to section 3.5 Vector length Register vl
The vl register holds an unsigned integer specifying the number of elements to be updated with results from a vector instruction
Are you saying that vcompress doesn't follow this rule and in this case vl applies to how many of the sources are being evaluated to have TRUE masks?
Other instructions also override the original definition of vl
--- reductions, for example. The meanings of "active", "body", etc., may also change in these cases.
Thanks Nick. While I agree that other instructions override the standard behavior, they do so with a clear statement indicating this deviation. In this case, the deviation is not clearly called out. It would help to have an explanation as to why there is a deviation (e.g., what is the use case). Also, SAIL code, or at least pseudo code would be invaluable to clarifying the intended behavior.
This instruction is more like vrgather than vadd in that it is collecting valid data which may lie beyond vl. In my example of vl=5, this vrgather could be followed by a store instruction that is intended to store the 5 compressed elements to contiguous memory. X86, for example, has a single instruction that compresses the elements based on the flags and then writes (just) them to memory; it would be nice if RISC-V could do this in two instructions without having to change the vl (or update the masks) between the two.
6.5 Vector Compress Instruction has the following text:
Presumably vl is used to specify how many elements in vd are updated. However, this sentence can easily be misinterpreted as saying that the first vl elements of vector register group vs2 are the only candidates to be moved to vd. I propose that the sentence be rewritten as something along the lines of: