riscv / riscv-isa-manual

RISC-V Instruction Set Manual
https://riscv.org/
Creative Commons Attribution 4.0 International
3.7k stars 645 forks source link

Reason for vstart≥vl requiring undisturbed tail elements even with `ta` vtype #1715

Open dzaima opened 2 weeks ago

dzaima commented 2 weeks ago

The vector specification requires that, when vstart≥vl, which includes vl=0, ~all operations do not disturb tail elements, even is tail-agnostic is set.

This to me seems like a rather odd requirement, especially considering that it forces vl=0 to be very special for register-renaming implementations. (I am not a hardware designer, but I did notice one repo having a good amount of commits fixing vl=0)

Is there some significant benefit to software from this (as compared to allowing tail elements to be replaced with all-1s if ta if hardware so wants)? I can't come up with any for the general vstart≥vl case, considering that vstart is intended to only be non-zero when restoring from a previously-interrupted instruction, which already could've thrashed the tail.

There are some cases that are possible for software to somewhat-reasonably meaningfully rely on (..primarily only to work around those instructions not working as desired at vl=0 but whatever..) - namely, reductions and vmv.s.x - that are perhaps too late to relax, but, if desired, I feel like it wouldn't be too unreasonable to relax everything else even now (especially considering that software has already been rather severely misled on RVV in a different aspect).

Of note is that the C/C++ RVV intrinsics have their own relaxed behavior on agnostic elements, which I believe means that they would be unaffected by the change, even reductions and vmv.s.x (those two don't even have a destination input outside of explicit tu).

(reductions still having a false dependency on their destination wouldn't be particularly nice, but not catastrophic, considering that the common vd==vs1 usage isn't affected; vmv.s.x is worse off though. Perhaps an option would be allowing those (or just vmv.s.x) to either set the first element as either the old value or the newly-calculated one, thus preserving all existing hardware remaining compliant, while allowing unconditional register renaming when ta for the future, while not affecting any software use-cases that I can think of; anyway, I'm not one to request a spec change (that'd be those actually making OoO vector hardware if they have design conditions where this is actually problematic), my primary question is really just what reason is there for the strictness in the first place)

gfavor commented 2 weeks ago

Since this question is very much a question about the architecture and a particular architectural design choice that was made (and not a question about some mistake or ambiguity in the arch spec), this question should instead be posted to the tech-vector-ext@lists.riscv.org and/or sig-vector@lists.riscv.org email lists.