zherczeg / sljit

Platform independent low-level JIT compiler
Other
850 stars 75 forks source link

riscv: SIGBUS when running `test_simd1` in RVV 1.0 CPU #271

Open carenas opened 2 months ago

carenas commented 2 months ago

crashing in the first SIMD test with:

(gdb) x/16i $pc-32
   0x3ff7fd7ecc:    addi    t1,t1,1
   0x3ff7fd7ed0:    slli    t1,t1,0xc
   0x3ff7fd7ed4:    xori    t1,t1,-320
   0x3ff7fd7ed8:    vse32.v v29,(t1)
   0x3ff7fd7edc:    addi    a0,s0,-1001
   0x3ff7fd7ee0:    addi    a1,s0,1001
   0x3ff7fd7ee4:    vsetivli    t1,4,e32,m1,tu,mu
   0x3ff7fd7ee8:    addi    t1,a0,1211
=> 0x3ff7fd7eec:    vle32.v v2,(t1)
   0x3ff7fd7ef0:    vsetivli    t1,4,e32,m1,tu,mu
   0x3ff7fd7ef4:    addi    t1,a1,-771
   0x3ff7fd7ef8:    vse32.v v2,(t1)
   0x3ff7fd7efc:    li  a0,32
   0x3ff7fd7f00:    li  a1,36
   0x3ff7fd7f04:    vsetivli    t1,2,e64,m1,tu,mu
   0x3ff7fd7f08:    slli    t1,a0,0x3
(gdb) info reg t1
t1             0x3fffffeed2 274877902546

with the following CPU:

model name      : Spacemit(R) X60
isa             : rv64imafdcv_zicbom_zicboz_zicntr_zicond_zicsr_zifencei_zihintpause_zihpm_zfh_zfhmin_zca_zcd_zba_zbb_zbc_zbs_zkt_zve32f_zve32x_zve64d_zve64f_zve64x_zvfh_zvfhmin_zvkt_sscofpmf_sstc_svinval_svnapot_svpbmt
mmu             : sv39
mvendorid       : 0x710
carenas commented 2 months ago

the bug is triggered by case 5, so probably SLJIT_SIMD_MEM_ALIGNED_16 might not be supported, at least in this CPU:

https://github.com/zherczeg/sljit/blob/f6326087b3404efb07c6d3deed97b3c3b8098c0c/test_src/sljitTestSimd.h#L142-L147

the documentation for RVV mentions:

Implementations are allowed to raise a misaligned address exception on whole register loads and stores if the base address is not naturally aligned to the larger of the size of the encoded EEW in bytes (EEW/8) or the implementation’s smallest supported SEW size in bytes (SEWMIN/8).

Note | Allowing misaligned exceptions to be raised based on non-alignment to the encoded EEW simplifies the implementation of these instructions. Some subset implementations might not support smaller SEW widths, so are allowed to report misaligned exceptions for the smallest supported SEW even if larger than encoded EEW. An extreme non-standard implementation might have SEWMIN>XLEN for example. Software environments can mandate the minimum alignment requirements to support an ABI. -- | --

and the system is running Debian (but with a vendor kernel) so it might be possible that other misaligned load exceptions are being masked (or could be masked)

zherczeg commented 2 months ago

Interesting limitations. I have never tried to code on real hardware, I have no access to them. The compiler can return with SLJIT_UNSUPPORTED if these limitations can be detected somehow.

carenas commented 2 months ago

FWIW, gcc 14.2.0 also triggers a Bus error, but next version seems to default to NOT allow misaligned loads unless it was requested.

zherczeg commented 2 months ago

I remember riscv was proud that misaligned memory support is always available.

Anyway, the test can be enhanced with more support[i] tests, and riscv could return with SLJIT_UNSUPPORTED for the unsupported forms, if this can be tested somehow.

carenas commented 2 months ago

I remember riscv was proud that misaligned memory support is always available.

Not sure if I would qualify it as "proud", but the Zicclsm extension that is mandatory for RVA20U64 profile CPUs said:

Even though mandated, misaligned loads and stores might execute extremely slowly. Standard software distributions should assume their existence only for correctness, not for performance.

And at least for Linux, the hwprobe RISCV syscall (which might be useful to allow probing also for the vector case) exports the performance characteristics of misaligned access to user space (see RISCV_HWPROBE_KEY_MISALIGNED_SCALAR_PERF).

my suggestion was to follow gcc in disabling this by default, but what we are missing is a way to enable it back at runtime (reenabling it at build time by leveraging gcc's notion of what the target can support would be nice but it is not something that can be exported now, unlike the other options we used; of course we could add an SLJIT specific flag to do so instead but that doesn't seem flexible enough IMHO)