Open dzaima opened 2 weeks ago
Since this question is very much a question about the architecture and a particular architectural design choice that was made (and not a question about some mistake or ambiguity in the arch spec), this question should instead be posted to the tech-vector-ext@lists.riscv.org and/or sig-vector@lists.riscv.org email lists.
The vector specification requires that, when
vstart≥vl
, which includesvl=0
, ~all operations do not disturb tail elements, even is tail-agnostic is set.This to me seems like a rather odd requirement, especially considering that it forces
vl=0
to be very special for register-renaming implementations. (I am not a hardware designer, but I did notice one repo having a good amount of commits fixing vl=0)Is there some significant benefit to software from this (as compared to allowing tail elements to be replaced with all-
1
s ifta
if hardware so wants)? I can't come up with any for the generalvstart≥vl
case, considering thatvstart
is intended to only be non-zero when restoring from a previously-interrupted instruction, which already could've thrashed the tail.There are some cases that are possible for software to somewhat-reasonably meaningfully rely on (..primarily only to work around those instructions not working as desired at
vl=0
but whatever..) - namely, reductions andvmv.s.x
- that are perhaps too late to relax, but, if desired, I feel like it wouldn't be too unreasonable to relax everything else even now (especially considering that software has already been rather severely misled on RVV in a different aspect).Of note is that the C/C++ RVV intrinsics have their own relaxed behavior on agnostic elements, which I believe means that they would be unaffected by the change, even reductions and
vmv.s.x
(those two don't even have a destination input outside of explicittu
).(reductions still having a false dependency on their destination wouldn't be particularly nice, but not catastrophic, considering that the common
vd==vs1
usage isn't affected;vmv.s.x
is worse off though. Perhaps an option would be allowing those (or justvmv.s.x
) to either set the first element as either the old value or the newly-calculated one, thus preserving all existing hardware remaining compliant, while allowing unconditional register renaming whenta
for the future, while not affecting any software use-cases that I can think of; anyway, I'm not one to request a spec change (that'd be those actually making OoO vector hardware if they have design conditions where this is actually problematic), my primary question is really just what reason is there for the strictness in the first place)