llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.09k stars 12k forks source link

Lack of vsetvli after function call for whole register move #114518

Open kito-cheng opened 1 week ago

kito-cheng commented 1 week ago

Unfortunately, whole register move instructions depend on vtype*1, which means they will cause an illegal instruction exception if VILL=1. This is generally not a problem, as VILL is set to 0 after any valid vsetvli instruction, so it’s usually safe unless the user executes a whole vector register move very early in the program.

However, the situation changed after the Linux kernel applied a patch[2] that sets VILL=1 after any system call. So, if we try to execute a whole register move after a system call, it will cause an illegal instruction exception. This can be difficult to detect, as the system call may not be invoked immediately; it might be deeply nested in a call chain, such as within printf. Unfortunately, this change has already shipped with Linux kernel 6.5, which was released on August 28, 2023.

I'm not sure if it's reasonable to ask the Linux kernel maintainers to fix this by keeping VILL consistent across system calls.

An alternative approach is to address this issue on the toolchain side by requiring at least one valid vsetvli instruction before any whole register move. This might be an ugly workaround, but it’s probably the simplest way to resolve the issue. I also realized this might be a better solution since the psABI specifies that VTYPE is NOT preserved across function calls. This means we can’t guarantee that VILL is not 1 at the function entry, so placing a vsetvli instruction right after the function call may be necessary.

Testcase:

#include <riscv_vector.h>
void bar() __attribute__((riscv_vector_cc));

vint32m1_t foo(vint32m1_t a) {
    bar();  // We never know bar will call system call inside or not.
    return a;
}

Generated asm with clang -target riscv64-unknown-elf -S -O3 -march=rv64gcv:

...
        .type   foo,@function
        .variant_cc     foo
foo:                                    # @foo
# %bb.0:                                # %entry
        addi    sp, sp, -16
        sd      ra, 8(sp)                       # 8-byte Folded Spill
        vmv1r.v v24, v8
        call    bar
        vmv1r.v v8, v24
        ld      ra, 8(sp)                       # 8-byte Folded Reload
        addi    sp, sp, 16
        ret
...

And the compiler could emits code like below to fix this issue:

...
        .type   foo,@function
        .variant_cc     foo
foo:                                    # @foo
# %bb.0:                                # %entry
        addi    sp, sp, -16
        sd      ra, 8(sp)                       # 8-byte Folded Spill
        vsetivli x0, 0, e8, m1, ta, ma   # Need vsetvli to make VILL=0 here
        vmv1r.v v24, v8
        call    bar
        vsetivli x0, 0, e8, m1, ta, ma  # Need vsetvli to make VILL=0 here
        vmv1r.v v8, v24
        ld      ra, 8(sp)                       # 8-byte Folded Reload
        addi    sp, sp, 16
        ret
...

NOTE: We have hit this issue within our internal spec run.

*1 That change[1] is made AFTER 1.0... [1] https://github.com/riscvarchive/riscv-v-spec/commit/856fe5bd1cb135c39258e6ca941bf234ae63e1b1
[2] https://github.com/torvalds/linux/commit/9657e9b7d2538dc73c24947aa00a8525dfb8062c

llvmbot commented 1 week ago

@llvm/issue-subscribers-backend-risc-v

Author: Kito Cheng (kito-cheng)

Unfortunately, whole register move instructions depend on `vtype`*1, which means they will cause an illegal instruction exception if VILL=1. This is generally not a problem, as VILL is set to 0 after any valid `vsetvli` instruction, so it’s usually safe unless the user executes a whole vector register move very early in the program. However, the situation changed after the Linux kernel applied a patch[2] that sets VILL=1 after any system call. So, if we try to execute a whole register move after a system call, it will cause an illegal instruction exception. This can be difficult to detect, as the system call may not be invoked immediately; it might be deeply nested in a call chain, such as within `printf`. Unfortunately, this change has already shipped with Linux kernel 6.5, which was released on August 28, 2023. I'm not sure if it's reasonable to ask the Linux kernel maintainers to fix this by keeping VILL consistent across system calls. An alternative approach is to address this issue on the toolchain side by requiring at least one valid `vsetvli` instruction before any whole register move. This might be an ugly workaround, but it’s probably the simplest way to resolve the issue. I also realized this might be a better solution since the psABI specifies that `VTYPE` is NOT preserved across function calls. This means we can’t guarantee that VILL is not 1 at the function entry, so placing a `vsetvli` instruction right after the function call may be necessary. Testcase: ```c #include <riscv_vector.h> void bar() __attribute__((riscv_vector_cc)); vint32m1_t foo(vint32m1_t a) { bar(); // We never know bar will call system call inside or not. return a; } ``` Generated asm with `clang -target riscv64-unknown-elf -S -O3 -march=rv64gcv`: ```asm ... .type foo,@function .variant_cc foo foo: # @foo # %bb.0: # %entry addi sp, sp, -16 sd ra, 8(sp) # 8-byte Folded Spill vmv1r.v v24, v8 call bar vmv1r.v v8, v24 ld ra, 8(sp) # 8-byte Folded Reload addi sp, sp, 16 ret ... ``` And the compiler could emits code like below to fix this issue: ```asm ... .type foo,@function .variant_cc foo foo: # @foo # %bb.0: # %entry addi sp, sp, -16 sd ra, 8(sp) # 8-byte Folded Spill vsetivli x0, 0, e8, m1, ta, ma # Need vsetvli to make VILL=0 here vmv1r.v v24, v8 call bar vsetivli x0, 0, e8, m1, ta, ma # Need vsetvli to make VILL=0 here vmv1r.v v8, v24 ld ra, 8(sp) # 8-byte Folded Reload addi sp, sp, 16 ret ... ``` NOTE: We have hit this issue within our internal spec run. `*1` That change[1] is made *AFTER* 1.0... [1] https://github.com/riscvarchive/riscv-v-spec/commit/856fe5bd1cb135c39258e6ca941bf234ae63e1b1 [2] https://github.com/torvalds/linux/commit/9657e9b7d2538dc73c24947aa00a8525dfb8062c
kito-cheng commented 1 week ago

cc: @topperc @BeMg @asb @preames @lukel97

Also cc some non LLVM folks since this is same situation for GCC @palmer-dabbelt @JeffreyALaw

The case for GCC, GCC will using stack rather than callee save reg in this case, so I use some inline asm trick to force GCC to use that:

#include <riscv_vector.h>
void bar() __attribute__((riscv_vector_cc));

vint32m1_t foo(vint32m1_t a, vint32m1_t b) {
    register vint32m1_t x asm("v24") = b;
    bar();
    asm ("#xx %0"::"vr"(x) );
    return x;
}
wangpc-pp commented 1 week ago

I may have missed something here, why would vmv<N>r.v depend on vtype? I didn't notice this before actually. Here is a note from the SPEC:

Note: These instructions are intended to aid compilers to shuffle vector registers without needing to know or change vl or vtype.

I don't see the necessariness of depending on vtype... maybe I missunderstand the spec or the spec is just being vague here.

kito-cheng commented 1 week ago

I may have missed something here, why would vmv<N>r.v depend on vtype? I didn't notice this before actually. Here is a note from the SPEC:

Note: These instructions are intended to aid compilers to shuffle vector registers without needing to know or change vl or vtype.

I don't see the necessariness of depending on vtype... maybe I missunderstand the spec or the spec is just being vague here.

Commit log from https://github.com/riscvarchive/riscv-v-spec/commit/856fe5bd1cb135c39258e6ca941bf234ae63e1b1 say: The normative text says that vmv<nr>r.v "operates as though EEW=SEW", meaning that it _does_ depend on vtype.

wangpc-pp commented 1 week ago

I may have missed something here, why would vmv<N>r.v depend on vtype? I didn't notice this before actually. Here is a note from the SPEC:

Note: These instructions are intended to aid compilers to shuffle vector registers without needing to know or change vl or vtype.

I don't see the necessariness of depending on vtype... maybe I missunderstand the spec or the spec is just being vague here.

Commit log from https://github.com/riscvarchive/riscv-v-spec/commit/856fe5bd1cb135c39258e6ca941bf234ae63e1b1 say: The normative text says that vmv<nr>r.v "operates as though EEW=SEW", meaning that it _does_ depend on vtype.

Yeah, I saw that. I mean, is it really necessary from the perspective of semantics? I think it is a mistake and introduces unnecessary constraints.

kito-cheng commented 1 week ago

Yeah, I saw that. I mean, is it really necessary from the perspective of semantics? I think it is a mistake and introduces unnecessary constraints.

I kinda agree with you but I guess SiFive is not the only one implement that semantics...also that's kinda spec conformance implementation

wangpc-pp commented 1 week ago

Yeah, I saw that. I mean, is it really necessary from the perspective of semantics? I think it is a mistake and introduces unnecessary constraints.

I kinda agree with you but I guess SiFive is not the only one implement that semantics...also that's kinda spec conformance implementation

The change https://github.com/riscvarchive/riscv-v-spec/commit/856fe5bd1cb135c39258e6ca941bf234ae63e1b1 was committed after the ratification of RVV 1.0, so it is not a mandatory request but a supplementary explanation (and I think it is not following the intuition).

I confirmed that Spacemit-X60 on K1 doesn't follow this:

bytedance@k1:~$ cat vmv.c
int main() {
        asm("li a0, 0x8000000000000000");
        asm("vsetvl a0, a0, a0");
        asm("vmv1r.v v0, v1"); // No SIGILL.
        asm("vadd.vv v0, v1, v2"); // SIGILL here.
        return 0;
}
bytedance@k1:~$ gcc -march=rv64gcv vmv.c -o vmv -g
bytedance@k1:~$ gdb vmv
GNU gdb (Ubuntu 14.0.50.20230907-0ubuntu1-bb1) 14.0.50.20230907-git
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "riscv64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from vmv...
(gdb) r
Starting program: /home/bytedance/vmv
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
main () at vmv.c:5
5               asm("vadd.vv v0, v1, v2");
(gdb) p /x $vtype
$1 = 0x8000000000000000

I haven't checked XuanTie C908 on K230 yet, but I 99.99% believe that it is the same as it is an "old" core.

preames commented 1 week ago

Summary of the change above matches my understanding, I'd previously written this up here: https://github.com/preames/public-notes/blob/master/riscv-spec-minutia.rst#id8

And yes, this change is extremely problematic. We can work around in SW, but ouch.

preames commented 1 week ago

Since this is getting public discussion, I put my notes on this topic in a public location. This was mostly written previously based off internal discussion, but has been minor updated to include new information discussed in this ticket. See https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst

wangpc-pp commented 1 week ago

Since this is getting public discussion, I put my notes on this topic in a public location. This was mostly written previously based off internal discussion, but has been minor updated to include new information discussed in this ticket. See https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst

Thanks for the summary! I think there should be option that we revert the change and reword that paragraph so that it clearly doesn't depend on vtype.

topperc commented 1 week ago

The change that added the EEW=SEW sentence to the whole register move section was added by Krste in October 2020 before ratification. It was done for hardware that rearranges data based on EEW. I'm on my phone so I can't easily dig up the commit right now.

wangpc-pp commented 1 week ago

The change that added the EEW=SEW sentence to the whole register move section was added by Krste in October 2020 before ratification. It was done for hardware that rearranges data based on EEW. I'm on my phone so I can't easily dig up the commit right now.

I digged the commit history.

For vill part:

  1. The sentence of exceptions was added in https://github.com/riscvarchive/riscv-v-spec/commit/4e7222162498615c8e2a2a41ae3dc2a6f2164815.
  2. And then removed whole register move in https://github.com/riscvarchive/riscv-v-spec/commit/856fe5bd1cb135c39258e6ca941bf234ae63e1b1.

For vmv<N>r.v part (there are too many related commits, I just paste important changes here):

  1. They was introduced by https://github.com/riscvarchive/riscv-v-spec/commit/7b02297ff0efe91e3dd74af4685091dfdf67810f. Note that It was ignoring the current settings of the vl and vtype register from the beginning.
  2. A clarification that changed the description to operate as if SEW=8 andvl=VLMAX, regardless of current settings in vtype and vl: https://github.com/riscvarchive/riscv-v-spec/commit/548232c1d88df152899f4b0af082e5f4a76426a9
  3. Some more clarifications in https://github.com/riscvarchive/riscv-v-spec/pull/378 and its related commits. It didn't change a lot.
  4. Rearrange things were added in https://github.com/riscvarchive/riscv-v-spec/commit/214455947a0d96d161c2abeb6060c604384ac888 and then https://github.com/riscvarchive/riscv-v-spec/commit/f1d349d79d0a14ed0d085c469f9032f2eb38fa6d made it EEW=SEW. From what I can tell, these sentences were just saying some implementations can treat vd==vs2 cases as HINTs.

My understanding is, it was clearly that vtype should be ignored since the beginning; then somehow the SEW=8 clarification started to make it vague (but I still think, people should interpret it as ignoring); the rearrange HINTs change made vagueness worse, it should be just a note for potential implementations, but somehow it led some people to think it may be a strong requirement.

IMO, there does exist some vagueness/contradiction in the spec, and please clarify it clearly/formally based on the semantics of whole register move.

wangpc-pp commented 1 week ago

Confirmed that C908 on K230 will also not trap on vmv<N>r.v:

[root@canaan ~ ]#./vmv
[  157.105988] vmv[171]: unhandled signal 4 code 0x1 at 0x000000000001054a in vmv[10000+69000]
[  157.114396] CPU: 0 PID: 171 Comm: vmv Not tainted 5.10.4+ #1
[  157.120060] epc: 000000000001054a ra : 00000000000105c6 sp : 0000003fffb28ba0
[  157.127199]  gp : 000000000007f518 tp : 0000000000085760 t0 : 0000000000000001
[  157.134423]  t1 : 000000000007ed28 t2 : 000000000000006a s0 : 0000003fffb28bb0
[  157.141647]  s1 : 0000000000000001 a0 : 0000000000000000 a1 : 0000003fffb28d58
[  157.148869]  a2 : 0000003fffb28d68 a3 : 0000000000000000 a4 : 0000003fffb28bd8
[  157.156092]  a5 : 0000000000010536 a6 : 000000000006afe0 a7 : 0000000000000001
[  157.163315]  s2 : 0000003fffb28d58 s3 : 0000000000000001 s4 : 0000003fffb28d68
[  157.170538]  s5 : 0000000000000001 s6 : 0000000000010536 s7 : 0000000000010258
[  157.177762]  s8 : 0000000000000000 s9 : 0000000000159fc0 s10: 0000000000156760
[  157.184984]  s11: 0000000000000001 t3 : 2f2f2f2f2f2f2f2f t4 : 000000000007d9e8
[  157.192207]  t5 : 0000000000084260 t6 : 0000000000079a30
[  157.197519] status: 8000000200004620 badaddr: 0000000002110057 cause: 0000000000000002
Illegal instruction

# objdump -d vmv >vmv.s
0000000000010536 <main>:
   10536:       1141                    add     sp,sp,-16
   10538:       e422                    sd      s0,8(sp)
   1053a:       0800                    add     s0,sp,16
   1053c:       fff0051b                addw    a0,zero,-1
   10540:       157e                    sll     a0,a0,0x3f
   10542:       80a57557                vsetvl  a0,a0,a0
   10546:       9e103057                vmv1r.v v0,v1
   1054a:       02110057                vadd.vv v0,v1,v2
   1054e:       4781                    li      a5,0
   10550:       853e                    mv      a0,a5
   10552:       6422                    ld      s0,8(sp)
   10554:       0141                    add     sp,sp,16
   10556:       8082                    ret
aswaterman commented 1 week ago

Much discussion about the ISA spec in this thread surrounds a particular non-normative note. Ultimately, non-normative text isn't relevant to this discussion, because, well, it isn't normative. The reason that text was changed post-ratification is that (a) it is valid to change non-normative text at any time, since by definition it doesn't affect the normative content, and (b) in this case, it contradicted the normative text, and the normative text always wins.

As Craig points out, the ratified normative text in the spec says that the instructions do depend on SEW (for e.g. vstart), hence they depend on vtype. This definition was in place long before to ratification.

Given the opportunity, I'm sure we'd revisit this ISA choice, but it is what it is. The fact that multiple implementations don't trap these instructions doesn't change the story. The spec says the behavior in this case is reserved. Raising an illegal-instruction trap is valid behavior. Not raising an illegal-instruction trap is also valid behavior.

wangpc-pp commented 1 week ago

The spec says the behavior in this case is reserved. Raising an illegal-instruction trap is valid behavior. Not raising an illegal-instruction trap is also valid behavior.

Please be specific here which part of the spec says this?

aswaterman commented 1 week ago

The Whole-Register Moves section says that the instructions' behavior depends on SEW. The Vector Type Illegal section says that an attempt to execute an instruction that depends on vtype when vill=1 raises an illegal-instruction exception.

(I had written earlier that the behavior is reserved, but my recollection was wrong; the spec makes the stronger statement that an exception must be raised.)

wangpc-pp commented 1 week ago

The Whole-Register Moves section says that the instructions' behavior depends on SEW.

Even for raified spec, it says:

The instructions operate as if EEW=SEW, EMUL = NREG, effective length evl= EMUL * VLEN/SEW.

I'm not a native English speaker, is as if a strong requirement?

aswaterman commented 1 week ago

”As if” isn’t a phrase used to suggest optionality; it’s a phrase used to establish an equivalence.

topperc commented 1 week ago

qemu believes whole register move depends on vsew and will generate SIGILL for if vtype.vill is set.

wangpc-pp commented 1 week ago

qemu believes whole register move depends on vsew and will generate SIGILL for if vtype.vill is set.

This is not an evidence to support this mistake, I can also say that GEM5 won't trap on this. Also, this is a "chicken or egg" problem.

AFAIK, several cores have to change their implementations because of this change recently. At least, XiangShan has already been misled because of this apparent mistake: https://github.com/OpenXiangShan/NEMU/pull/511. The impact on software would be much bigger.

Or, let me ask this in another way: What benefit would we have if whole register move depends on vtype? Would it make RISC-V much more competitive than X86/AArch64? Is this a state-of-the-art architecture innovation? How will you handle these products which are already on the market?

topperc commented 1 week ago

Or, let me ask this in another way: What benefit would we have if whole register move depends on vtype?

The ratified spec already says that it depends on vtype. Non-normative text was not updated when the change was made. That was a mistake, but the normative text was updated. We cannot change the spec now without defining a new extension.

How will you handle these products which are already on the market?

Which products? SiFive cores implement the dependence on vtype and are used by customers. Just because they aren't available for retail doesn't mean they can be ignored.

topperc commented 1 week ago

This is not an evidence to support this mistake, I can also say that GEM5 won't trap on this. Also, this is a "chicken or egg" problem.

How does GEM5 or any of the CPUs that don't trap handle vstart!=0 for these instructions?

wangpc-pp commented 1 week ago

Or, let me ask this in another way: What benefit would we have if whole register move depends on vtype?

The ratified spec already says that it depends on vtype. Non-normative text was not updated when the change was made. That was a mistake, but the normative text was updated. We cannot change the spec now without defining a new extension.

How will you handle these products which are already on the market?

Which products? SiFive cores implement the dependence on vtype and are used by customers. Just because they aren't available for retail doesn't mean they can be ignored.

Sorry I don't mean to make this discussion argumentative, my apologies. So you do agree there does exist a way to define a new extension to fix this mistake, right? Can we start it right now? WDYT? cc @aswaterman

wangpc-pp commented 1 week ago

This is not an evidence to support this mistake, I can also say that GEM5 won't trap on this. Also, this is a "chicken or egg" problem.

How does GEM5 or any of the CPUs that don't trap handle vstart!=0 for these instructions?

Well, I think it is because we can make sure vstart won't be non-zero. Of cource, this may not match the spec. The formatted generated execution for current gem5 implementation:

Fault Vmv1r_vMicro::execute(ExecContext *xc,
                            trace::InstRecord *traceData) const {
  // TODO: Check register alignment.
  // TODO: If vd is equal to vs2 the instruction is an architectural NOP.
  MISA misa = xc->readMiscReg(MISCREG_ISA);
  STATUS status = xc->readMiscReg(MISCREG_STATUS);

  if (!misa.rvv || status.vs == VPUStatus::OFF) {
    return std::make_shared<IllegalInstFault>("RVV is disabled or VPU is off",
                                              machInst);
  }

  status.vs = VPUStatus::DIRTY;
  xc->setMiscReg(MISCREG_STATUS, status);

  /* Vars for Vs2*/ /* End vars for Vs2 */
  uint64_t VlenbBits = 0;
  RiscvISAInst::PCState __parserAutoPCState;
  set(__parserAutoPCState, xc->pcState());
  auto &tmp_d0 =
      *(RiscvISAInst::VecRegContainer *)xc->getWritableRegOperand(this, 0);
  auto Vd = tmp_d0.as<uint64_t>();
  RiscvISAInst::VecRegContainer tmp_s0;
  xc->getRegOperand(this, 0, &tmp_s0);
  auto Vs2 = tmp_s0.as<uint64_t>();
  VlenbBits = __parserAutoPCState.vlenb();
  uint32_t vlen = VlenbBits * 8;
  for (size_t i = 0; i < (vlen / 64); i++) {
    Vd[i] = Vs2[i];
  }

  if (traceData) {
    traceData->setData(vecRegClass, &tmp_d0);
  };
  return NoFault;
}
luismarques commented 1 week ago

Since this is getting public discussion, I put my notes on this topic in a public location. This was mostly written previously based off internal discussion, but has been minor updated to include new information discussed in this ticket. See https://github.com/preames/public-notes/blob/master/riscv/whole-register-move-abi.rst

Regarding "Option 1 - Change the ABI", have you/we considered the possibility of requiring the ABI to preserve VILL=0 except for syscalls? I.e., effectively mandate that after a syscall you need to do a vsetivli to restore vill to zero (to not change the kernel or, even if it's changed, be compatible with the already released Linux kernel versions 6.5+).

dzaima commented 1 week ago

requiring the ABI to preserve VILL=0 except for syscalls

That would mean that existing software/libraries doing direct syscalls (maybe incl. glibc?) would cease to follow the ABI; not much worse than existing kernels doing so I'd think.

And handling that in userspace would be very messy, requiring conditionally running the vsetivli after a syscall if RVV is present (which leads to a mild mess, considering that it forces the first syscall to be a test for RVV support, which'd then have to immediately initialize the branch condition for running the vsetivli for itself and future syscalls).