riscv / riscv-v-spec

Working draft of the proposed RISC-V V vector extension
https://jira.riscv.org/browse/RVG-122
Creative Commons Attribution 4.0 International
949 stars 271 forks source link

allow vector element processing to effectively proceed in parallel, not strict 0 to vl, for most ops. #534

Open David-Horner opened 4 years ago

David-Horner commented 4 years ago

Most vector operations yield the same results if the elements are processed in any order.

With an opaque vstart #532 additional restart information can be encoded in XLEN bits to allow restart of various restart states.

Explicitly relax the from 0 to vl constraint, effectively allowing parallel processing.

Note, load operations especially can benefit from this relaxation. Specifically element loads can occur opportunistically leveraging any present cache entry without having to wait for any other request to complete, and cache subsystem requests can be optimal in any order. See discussions in #502 and #504.

Notably, stores are affected by process order. Equally notable is the presence of an ordered and an unordered variant for stores.

Some overlapping destination with source register decisions are premised on 0 to vl processing order. It is valid to still enforce these constraints as simple systems may elect to always process from 0 to vl, and they should not be penalized for a potential optimization by other implementations.

kasanovic commented 3 years ago

We require vstart to report faulting element in base v1.0, as this is needed to simplify error handling and reporting, but reserved other values >VLMAX for future use. In general, restart will require more than the number of bits in start, so unclear on the utility/generality of this approach. Marking as a post-v1.0 issue.