riscv-non-isa / rvv-intrinsic-doc

https://jira.riscv.org/browse/RVG-153
BSD 3-Clause "New" or "Revised" License
287 stars 90 forks source link

Polishing to the specification #271

Closed eopXD closed 1 year ago

dzaima commented 1 year ago

Do I understand correctly that the intent of a __riscv_vsetvl_* intrinsic is to be exactly min(avl, VLMAX), and thus be impossible to map directly to a vsetvli rd, rs1, ... instruction without some additional computation of the min()?

If that's the case, then the current LLVM and GCC implementations are non-conforming, as they do map directly to vsetvli: https://godbolt.org/z/bzYG3vxzM.

For example, on a VLEN=128 implementation (i.e. VLMAX=4 for e32m1), a wrap(6) call of the LLVM/GCC output would result in a vsetvli instruction that is allowed to, on a RVV1.0-compliant RISC-V implementation, produce a vl of either 3 or 4, as per "ceil(AVL / 2) ≤ vl ≤ VLMAX if AVL < (2 * VLMAX)" of section 6.3 of the V spec, and thus wrap(6) could equal 3, while this document specifies min(avl,VLMAX) = min(6,4) = 4.

eopXD commented 1 year ago

Do I understand correctly that the intent of a __riscv_vsetvl_* intrinsic is to be exactly min(avl, VLMAX), and thus be impossible to map directly to a vsetvli rd, rs1, ... instruction without some additional computation of the min()?

If that's the case, then the current LLVM and GCC implementations are non-conforming, as they do map directly to vsetvli: https://godbolt.org/z/bzYG3vxzM.

For example, on a VLEN=128 implementation (i.e. VLMAX=4 for e32m1), a wrap(6) call of the LLVM/GCC output would result in a vsetvli instruction that is allowed to, on a RVV1.0-compliant RISC-V implementation, produce a vl of either 3 or 4, as per "ceil(AVL / 2) ≤ vl ≤ VLMAX if AVL < (2 * VLMAX)" of section 6.3 of the V spec, and thus wrap(6) could equal 3, while this document specifies min(avl,VLMAX) = min(6,4) = 4.

Ah, yes. You have found the hidden piece of the intrinsic implementation. However if you see this snippet https://godbolt.org/z/Wx4TPq7fs, the compiler is not respecting the element width and length multiplier provided by vsetvl. The vsetvl intrinsic is not intended to map to an exact instruction, the compiler is responsible to insert the appropriate vsetvl instruction.

topperc commented 1 year ago

Do I understand correctly that the intent of a __riscv_vsetvl_* intrinsic is to be exactly min(avl, VLMAX), and thus be impossible to map directly to a vsetvli rd, rs1, ... instruction without some additional computation of the min()? If that's the case, then the current LLVM and GCC implementations are non-conforming, as they do map directly to vsetvli: https://godbolt.org/z/bzYG3vxzM. For example, on a VLEN=128 implementation (i.e. VLMAX=4 for e32m1), a wrap(6) call of the LLVM/GCC output would result in a vsetvli instruction that is allowed to, on a RVV1.0-compliant RISC-V implementation, produce a vl of either 3 or 4, as per "ceil(AVL / 2) ≤ vl ≤ VLMAX if AVL < (2 * VLMAX)" of section 6.3 of the V spec, and thus wrap(6) could equal 3, while this document specifies min(avl,VLMAX) = min(6,4) = 4.

Ah, yes. You have found the hidden piece of the intrinsic implementation. However if you see this snippet https://godbolt.org/z/Wx4TPq7fs, the compiler is not respecting the element width and length multiplier provided by vsetvl. The vsetvl intrinsic is not intended to map to an exact instruction, the compiler is responsible to insert the appropriate vsetvl instruction.

The compiler is still required respect the VLMAX implied by the SEW and LMUL. This means the compiler must respect the ratio between SEW and LMUL given to the intrinsic. The intrinsic must return the value the hardware vsetvli intruction would return for that VLMAX which is not always min(avl, VLMAX).

dzaima commented 1 year ago

Right, I am not saying that __riscv_vsetvl_e32m1 should necessarily map to vsetvli rd, rs1, e32, m1; indeed, vsetvli rd, rs1, e32, m1 and vsetvli rd, rs1, e64, m2 will result in the same rd value, and thus a compiler can choose to use those interchangeably for calculating rd.

What I am saying is that __riscv_vsetvl_e32m1(avl), if specified to be min(avl, VLMAX), can not be implemented as vsetvli rd, x[avl], e32, m1 (or ..., e64, m1, etc), even if the compiler wanted to do so, because the behavior of that vsetvli instruction is not guaranteed to be min(avl, VLMAX), as per the aforementioned section 6.3. of the V extension 1.0 specification.

eopXD commented 1 year ago

@nick-knight @dzaima @topperc Just made an update to description associated with "control of vl" and pseudo intrinsics vsetvl and vsetvlmax. Please check again, thank you.

eopXD commented 1 year ago

Ping.

eopXD commented 1 year ago

Ping for more review comments, if not, the pull request will be merged one week later since we have converged to what this pull request is proposing.

eopXD commented 1 year ago

Merging this PR now. Thanks all people that helped to work this through!