Open kito-cheng opened 1 week ago
@llvm/issue-subscribers-backend-risc-v
Author: Kito Cheng (kito-cheng)
cc: @topperc @BeMg @asb @preames @lukel97
Also cc some non LLVM folks since this is same situation for GCC @palmer-dabbelt @JeffreyALaw
The case for GCC, GCC will using stack rather than callee save reg in this case, so I use some inline asm trick to force GCC to use that:
#include <riscv_vector.h>
void bar() __attribute__((riscv_vector_cc));
vint32m1_t foo(vint32m1_t a, vint32m1_t b) {
register vint32m1_t x asm("v24") = b;
bar();
asm ("#xx %0"::"vr"(x) );
return x;
}
I may have missed something here, why would vmv<N>r.v
depend on vtype? I didn't notice this before actually.
Here is a note from the SPEC:
Note: These instructions are intended to aid compilers to shuffle vector registers without needing to know or change vl or vtype.
I don't see the necessariness of depending on vtype... maybe I missunderstand the spec or the spec is just being vague here.
I may have missed something here, why would
vmv<N>r.v
depend on vtype? I didn't notice this before actually. Here is a note from the SPEC:Note: These instructions are intended to aid compilers to shuffle vector registers without needing to know or change vl or vtype.
I don't see the necessariness of depending on vtype... maybe I missunderstand the spec or the spec is just being vague here.
Commit log from https://github.com/riscvarchive/riscv-v-spec/commit/856fe5bd1cb135c39258e6ca941bf234ae63e1b1 say: The normative text says that vmv<nr>r.v "operates as though EEW=SEW", meaning that it _does_ depend on vtype.
I may have missed something here, why would
vmv<N>r.v
depend on vtype? I didn't notice this before actually. Here is a note from the SPEC:Note: These instructions are intended to aid compilers to shuffle vector registers without needing to know or change vl or vtype.
I don't see the necessariness of depending on vtype... maybe I missunderstand the spec or the spec is just being vague here.
Commit log from https://github.com/riscvarchive/riscv-v-spec/commit/856fe5bd1cb135c39258e6ca941bf234ae63e1b1 say:
The normative text says that vmv<nr>r.v "operates as though EEW=SEW", meaning that it _does_ depend on vtype.
Yeah, I saw that. I mean, is it really necessary from the perspective of semantics? I think it is a mistake and introduces unnecessary constraints.
Yeah, I saw that. I mean, is it really necessary from the perspective of semantics? I think it is a mistake and introduces unnecessary constraints.
I kinda agree with you but I guess SiFive is not the only one implement that semantics...also that's kinda spec conformance implementation
Yeah, I saw that. I mean, is it really necessary from the perspective of semantics? I think it is a mistake and introduces unnecessary constraints.
I kinda agree with you but I guess SiFive is not the only one implement that semantics...also that's kinda spec conformance implementation
The change https://github.com/riscvarchive/riscv-v-spec/commit/856fe5bd1cb135c39258e6ca941bf234ae63e1b1 was committed after the ratification of RVV 1.0, so it is not a mandatory request but a supplementary explanation (and I think it is not following the intuition).
I confirmed that Spacemit-X60 on K1 doesn't follow this:
bytedance@k1:~$ cat vmv.c
int main() {
asm("li a0, 0x8000000000000000");
asm("vsetvl a0, a0, a0");
asm("vmv1r.v v0, v1"); // No SIGILL.
asm("vadd.vv v0, v1, v2"); // SIGILL here.
return 0;
}
bytedance@k1:~$ gcc -march=rv64gcv vmv.c -o vmv -g
bytedance@k1:~$ gdb vmv
GNU gdb (Ubuntu 14.0.50.20230907-0ubuntu1-bb1) 14.0.50.20230907-git
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "riscv64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from vmv...
(gdb) r
Starting program: /home/bytedance/vmv
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
Program received signal SIGILL, Illegal instruction.
main () at vmv.c:5
5 asm("vadd.vv v0, v1, v2");
(gdb) p /x $vtype
$1 = 0x8000000000000000
I haven't checked XuanTie C908 on K230 yet, but I 99.99% believe that it is the same as it is an "old" core.
Summary of the change above matches my understanding, I'd previously written this up here: https://github.com/preames/public-notes/blob/master/riscv-spec-minutia.rst#id8
And yes, this change is extremely problematic. We can work around in SW, but ouch.
Since this is getting public discussion, I put my notes on this topic in a public location. This was mostly written previously based off internal discussion, but has been minor updated to include new information discussed in this ticket. See https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst
Since this is getting public discussion, I put my notes on this topic in a public location. This was mostly written previously based off internal discussion, but has been minor updated to include new information discussed in this ticket. See https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst
Thanks for the summary! I think there should be option that we revert the change and reword that paragraph so that it clearly doesn't depend on vtype.
The change that added the EEW=SEW sentence to the whole register move section was added by Krste in October 2020 before ratification. It was done for hardware that rearranges data based on EEW. I'm on my phone so I can't easily dig up the commit right now.
The change that added the EEW=SEW sentence to the whole register move section was added by Krste in October 2020 before ratification. It was done for hardware that rearranges data based on EEW. I'm on my phone so I can't easily dig up the commit right now.
I digged the commit history.
For vill
part:
For vmv<N>r.v
part (there are too many related commits, I just paste important changes here):
ignoring the current settings of the vl and vtype register
from the beginning.operate as if SEW=8 and
vl=VLMAX, regardless of current settings in vtype and vl
: https://github.com/riscvarchive/riscv-v-spec/commit/548232c1d88df152899f4b0af082e5f4a76426a9EEW=SEW
. From what I can tell, these sentences were just saying some implementations can treat vd==vs2
cases as HINTs.My understanding is, it was clearly that vtype
should be ignored since the beginning; then somehow the SEW=8
clarification started to make it vague (but I still think, people should interpret it as ignoring
); the rearrange HINTs change made vagueness worse, it should be just a note for potential implementations, but somehow it led some people to think it may be a strong requirement.
IMO, there does exist some vagueness/contradiction in the spec, and please clarify it clearly/formally based on the semantics of whole register move.
Confirmed that C908 on K230 will also not trap on vmv<N>r.v
:
[root@canaan ~ ]#./vmv
[ 157.105988] vmv[171]: unhandled signal 4 code 0x1 at 0x000000000001054a in vmv[10000+69000]
[ 157.114396] CPU: 0 PID: 171 Comm: vmv Not tainted 5.10.4+ #1
[ 157.120060] epc: 000000000001054a ra : 00000000000105c6 sp : 0000003fffb28ba0
[ 157.127199] gp : 000000000007f518 tp : 0000000000085760 t0 : 0000000000000001
[ 157.134423] t1 : 000000000007ed28 t2 : 000000000000006a s0 : 0000003fffb28bb0
[ 157.141647] s1 : 0000000000000001 a0 : 0000000000000000 a1 : 0000003fffb28d58
[ 157.148869] a2 : 0000003fffb28d68 a3 : 0000000000000000 a4 : 0000003fffb28bd8
[ 157.156092] a5 : 0000000000010536 a6 : 000000000006afe0 a7 : 0000000000000001
[ 157.163315] s2 : 0000003fffb28d58 s3 : 0000000000000001 s4 : 0000003fffb28d68
[ 157.170538] s5 : 0000000000000001 s6 : 0000000000010536 s7 : 0000000000010258
[ 157.177762] s8 : 0000000000000000 s9 : 0000000000159fc0 s10: 0000000000156760
[ 157.184984] s11: 0000000000000001 t3 : 2f2f2f2f2f2f2f2f t4 : 000000000007d9e8
[ 157.192207] t5 : 0000000000084260 t6 : 0000000000079a30
[ 157.197519] status: 8000000200004620 badaddr: 0000000002110057 cause: 0000000000000002
Illegal instruction
# objdump -d vmv >vmv.s
0000000000010536 <main>:
10536: 1141 add sp,sp,-16
10538: e422 sd s0,8(sp)
1053a: 0800 add s0,sp,16
1053c: fff0051b addw a0,zero,-1
10540: 157e sll a0,a0,0x3f
10542: 80a57557 vsetvl a0,a0,a0
10546: 9e103057 vmv1r.v v0,v1
1054a: 02110057 vadd.vv v0,v1,v2
1054e: 4781 li a5,0
10550: 853e mv a0,a5
10552: 6422 ld s0,8(sp)
10554: 0141 add sp,sp,16
10556: 8082 ret
Much discussion about the ISA spec in this thread surrounds a particular non-normative note. Ultimately, non-normative text isn't relevant to this discussion, because, well, it isn't normative. The reason that text was changed post-ratification is that (a) it is valid to change non-normative text at any time, since by definition it doesn't affect the normative content, and (b) in this case, it contradicted the normative text, and the normative text always wins.
As Craig points out, the ratified normative text in the spec says that the instructions do depend on SEW (for e.g. vstart), hence they depend on vtype. This definition was in place long before to ratification.
Given the opportunity, I'm sure we'd revisit this ISA choice, but it is what it is. The fact that multiple implementations don't trap these instructions doesn't change the story. The spec says the behavior in this case is reserved. Raising an illegal-instruction trap is valid behavior. Not raising an illegal-instruction trap is also valid behavior.
The spec says the behavior in this case is reserved. Raising an illegal-instruction trap is valid behavior. Not raising an illegal-instruction trap is also valid behavior.
Please be specific here which part of the spec says this?
The Whole-Register Moves section says that the instructions' behavior depends on SEW. The Vector Type Illegal section says that an attempt to execute an instruction that depends on vtype when vill=1 raises an illegal-instruction exception.
(I had written earlier that the behavior is reserved, but my recollection was wrong; the spec makes the stronger statement that an exception must be raised.)
The Whole-Register Moves section says that the instructions' behavior depends on SEW.
Even for raified spec, it says:
The instructions operate as if EEW=SEW, EMUL = NREG, effective length evl= EMUL * VLEN/SEW.
I'm not a native English speaker, is as if
a strong requirement?
”As if” isn’t a phrase used to suggest optionality; it’s a phrase used to establish an equivalence.
qemu believes whole register move depends on vsew and will generate SIGILL for if vtype.vill is set.
qemu believes whole register move depends on vsew and will generate SIGILL for if vtype.vill is set.
This is not an evidence to support this mistake, I can also say that GEM5 won't trap on this. Also, this is a "chicken or egg" problem.
AFAIK, several cores have to change their implementations because of this change recently. At least, XiangShan has already been misled because of this apparent mistake: https://github.com/OpenXiangShan/NEMU/pull/511. The impact on software would be much bigger.
Or, let me ask this in another way: What benefit would we have if whole register move depends on vtype
? Would it make RISC-V much more competitive than X86/AArch64? Is this a state-of-the-art architecture innovation? How will you handle these products which are already on the market?
Or, let me ask this in another way: What benefit would we have if whole register move depends on
vtype
?
The ratified spec already says that it depends on vtype. Non-normative text was not updated when the change was made. That was a mistake, but the normative text was updated. We cannot change the spec now without defining a new extension.
How will you handle these products which are already on the market?
Which products? SiFive cores implement the dependence on vtype and are used by customers. Just because they aren't available for retail doesn't mean they can be ignored.
This is not an evidence to support this mistake, I can also say that GEM5 won't trap on this. Also, this is a "chicken or egg" problem.
How does GEM5 or any of the CPUs that don't trap handle vstart!=0 for these instructions?
Or, let me ask this in another way: What benefit would we have if whole register move depends on
vtype
?The ratified spec already says that it depends on vtype. Non-normative text was not updated when the change was made. That was a mistake, but the normative text was updated. We cannot change the spec now without defining a new extension.
How will you handle these products which are already on the market?
Which products? SiFive cores implement the dependence on vtype and are used by customers. Just because they aren't available for retail doesn't mean they can be ignored.
Sorry I don't mean to make this discussion argumentative, my apologies. So you do agree there does exist a way to define a new extension to fix this mistake, right? Can we start it right now? WDYT? cc @aswaterman
This is not an evidence to support this mistake, I can also say that GEM5 won't trap on this. Also, this is a "chicken or egg" problem.
How does GEM5 or any of the CPUs that don't trap handle vstart!=0 for these instructions?
Well, I think it is because we can make sure vstart won't be non-zero. Of cource, this may not match the spec. The formatted generated execution for current gem5 implementation:
Fault Vmv1r_vMicro::execute(ExecContext *xc,
trace::InstRecord *traceData) const {
// TODO: Check register alignment.
// TODO: If vd is equal to vs2 the instruction is an architectural NOP.
MISA misa = xc->readMiscReg(MISCREG_ISA);
STATUS status = xc->readMiscReg(MISCREG_STATUS);
if (!misa.rvv || status.vs == VPUStatus::OFF) {
return std::make_shared<IllegalInstFault>("RVV is disabled or VPU is off",
machInst);
}
status.vs = VPUStatus::DIRTY;
xc->setMiscReg(MISCREG_STATUS, status);
/* Vars for Vs2*/ /* End vars for Vs2 */
uint64_t VlenbBits = 0;
RiscvISAInst::PCState __parserAutoPCState;
set(__parserAutoPCState, xc->pcState());
auto &tmp_d0 =
*(RiscvISAInst::VecRegContainer *)xc->getWritableRegOperand(this, 0);
auto Vd = tmp_d0.as<uint64_t>();
RiscvISAInst::VecRegContainer tmp_s0;
xc->getRegOperand(this, 0, &tmp_s0);
auto Vs2 = tmp_s0.as<uint64_t>();
VlenbBits = __parserAutoPCState.vlenb();
uint32_t vlen = VlenbBits * 8;
for (size_t i = 0; i < (vlen / 64); i++) {
Vd[i] = Vs2[i];
}
if (traceData) {
traceData->setData(vecRegClass, &tmp_d0);
};
return NoFault;
}
Since this is getting public discussion, I put my notes on this topic in a public location. This was mostly written previously based off internal discussion, but has been minor updated to include new information discussed in this ticket. See https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst
Regarding "Option 1 - Change the ABI", have you/we considered the possibility of requiring the ABI to preserve VILL=0 except for syscalls? I.e., effectively mandate that after a syscall you need to do a vsetivli
to restore vill to zero (to not change the kernel or, even if it's changed, be compatible with the already released Linux kernel versions 6.5+).
requiring the ABI to preserve VILL=0 except for syscalls
That would mean that existing software/libraries doing direct syscalls (maybe incl. glibc?) would cease to follow the ABI; not much worse than existing kernels doing so I'd think.
And handling that in userspace would be very messy, requiring conditionally running the vsetivli
after a syscall if RVV is present (which leads to a mild mess, considering that it forces the first syscall to be a test for RVV support, which'd then have to immediately initialize the branch condition for running the vsetivli for itself and future syscalls).
Unfortunately, whole register move instructions depend on
vtype
*1, which means they will cause an illegal instruction exception if VILL=1. This is generally not a problem, as VILL is set to 0 after any validvsetvli
instruction, so it’s usually safe unless the user executes a whole vector register move very early in the program.However, the situation changed after the Linux kernel applied a patch[2] that sets VILL=1 after any system call. So, if we try to execute a whole register move after a system call, it will cause an illegal instruction exception. This can be difficult to detect, as the system call may not be invoked immediately; it might be deeply nested in a call chain, such as within
printf
. Unfortunately, this change has already shipped with Linux kernel 6.5, which was released on August 28, 2023.I'm not sure if it's reasonable to ask the Linux kernel maintainers to fix this by keeping VILL consistent across system calls.
An alternative approach is to address this issue on the toolchain side by requiring at least one valid
vsetvli
instruction before any whole register move. This might be an ugly workaround, but it’s probably the simplest way to resolve the issue. I also realized this might be a better solution since the psABI specifies thatVTYPE
is NOT preserved across function calls. This means we can’t guarantee that VILL is not 1 at the function entry, so placing avsetvli
instruction right after the function call may be necessary.Testcase:
Generated asm with
clang -target riscv64-unknown-elf -S -O3 -march=rv64gcv
:And the compiler could emits code like below to fix this issue:
NOTE: We have hit this issue within our internal spec run.
*1
That change[1] is made AFTER 1.0... [1] https://github.com/riscvarchive/riscv-v-spec/commit/856fe5bd1cb135c39258e6ca941bf234ae63e1b1[2] https://github.com/torvalds/linux/commit/9657e9b7d2538dc73c24947aa00a8525dfb8062c