Closed eopXD closed 1 year ago
Do I understand correctly that the intent of a
__riscv_vsetvl_*
intrinsic is to be exactlymin(avl, VLMAX)
, and thus be impossible to map directly to avsetvli rd, rs1, ...
instruction without some additional computation of themin()
?If that's the case, then the current LLVM and GCC implementations are non-conforming, as they do map directly to
vsetvli
: https://godbolt.org/z/bzYG3vxzM.For example, on a VLEN=128 implementation (i.e.
VLMAX=4
for e32m1), awrap(6)
call of the LLVM/GCC output would result in avsetvli
instruction that is allowed to, on a RVV1.0-compliant RISC-V implementation, produce avl
of either3
or4
, as per "ceil(AVL / 2) ≤ vl ≤ VLMAX
ifAVL < (2 * VLMAX)
" of section 6.3 of the V spec, and thuswrap(6)
could equal3
, while this document specifiesmin(avl,VLMAX) = min(6,4) = 4
.
Ah, yes. You have found the hidden piece of the intrinsic implementation. However if you see this snippet https://godbolt.org/z/Wx4TPq7fs, the compiler is not respecting the element width and length multiplier provided by vsetvl
. The vsetvl intrinsic is not intended to map to an exact instruction, the compiler is responsible to insert the appropriate vsetvl instruction.
Do I understand correctly that the intent of a
__riscv_vsetvl_*
intrinsic is to be exactlymin(avl, VLMAX)
, and thus be impossible to map directly to avsetvli rd, rs1, ...
instruction without some additional computation of themin()
? If that's the case, then the current LLVM and GCC implementations are non-conforming, as they do map directly tovsetvli
: https://godbolt.org/z/bzYG3vxzM. For example, on a VLEN=128 implementation (i.e.VLMAX=4
for e32m1), awrap(6)
call of the LLVM/GCC output would result in avsetvli
instruction that is allowed to, on a RVV1.0-compliant RISC-V implementation, produce avl
of either3
or4
, as per "ceil(AVL / 2) ≤ vl ≤ VLMAX
ifAVL < (2 * VLMAX)
" of section 6.3 of the V spec, and thuswrap(6)
could equal3
, while this document specifiesmin(avl,VLMAX) = min(6,4) = 4
.Ah, yes. You have found the hidden piece of the intrinsic implementation. However if you see this snippet https://godbolt.org/z/Wx4TPq7fs, the compiler is not respecting the element width and length multiplier provided by
vsetvl
. The vsetvl intrinsic is not intended to map to an exact instruction, the compiler is responsible to insert the appropriate vsetvl instruction.
The compiler is still required respect the VLMAX implied by the SEW and LMUL. This means the compiler must respect the ratio between SEW and LMUL given to the intrinsic. The intrinsic must return the value the hardware vsetvli intruction would return for that VLMAX which is not always min(avl, VLMAX).
Right, I am not saying that __riscv_vsetvl_e32m1
should necessarily map to vsetvli rd, rs1, e32, m1
; indeed, vsetvli rd, rs1, e32, m1
and vsetvli rd, rs1, e64, m2
will result in the same rd
value, and thus a compiler can choose to use those interchangeably for calculating rd
.
What I am saying is that __riscv_vsetvl_e32m1(avl)
, if specified to be min(avl, VLMAX)
, can not be implemented as vsetvli rd, x[avl], e32, m1
(or ..., e64, m1
, etc), even if the compiler wanted to do so, because the behavior of that vsetvli
instruction is not guaranteed to be min(avl, VLMAX)
, as per the aforementioned section 6.3. of the V extension 1.0 specification.
@nick-knight @dzaima @topperc Just made an update to description associated with "control of vl
" and pseudo intrinsics vsetvl
and vsetvlmax
. Please check again, thank you.
Ping.
Ping for more review comments, if not, the pull request will be merged one week later since we have converged to what this pull request is proposing.
Merging this PR now. Thanks all people that helped to work this through!
Do I understand correctly that the intent of a
__riscv_vsetvl_*
intrinsic is to be exactlymin(avl, VLMAX)
, and thus be impossible to map directly to avsetvli rd, rs1, ...
instruction without some additional computation of themin()
?If that's the case, then the current LLVM and GCC implementations are non-conforming, as they do map directly to
vsetvli
: https://godbolt.org/z/bzYG3vxzM.For example, on a VLEN=128 implementation (i.e.
VLMAX=4
for e32m1), awrap(6)
call of the LLVM/GCC output would result in avsetvli
instruction that is allowed to, on a RVV1.0-compliant RISC-V implementation, produce avl
of either3
or4
, as per "ceil(AVL / 2) ≤ vl ≤ VLMAX
ifAVL < (2 * VLMAX)
" of section 6.3 of the V spec, and thuswrap(6)
could equal3
, while this document specifiesmin(avl,VLMAX) = min(6,4) = 4
.