[RISCV] vmsge_vx intrinsics with a large immediate should do vmsgt(x,imm-1)

dzaima commented 1 week ago

The following two functions have identical behavior:

#include<riscv_vector.h>
vbool32_t vmsgeu(vuint32m1_t op1, size_t vl) {
    return __riscv_vmsgeu_vx_u32m1_b32(op1, 100, vl);
}
vbool32_t vmsgtu(vuint32m1_t op1, size_t vl) {
    return __riscv_vmsgtu_vx_u32m1_b32(op1, 99, vl);
}

but clang generates different code for them, less efficient for the vmsgeu case:

vmsgeu:
        li      a1, 100
        vsetvli zero, a0, e32, m1, ta, ma
        vmsltu.vx       v8, v8, a1
        vmnot.m v0, v8
        ret

vmsgtu:
        li      a1, 99
        vsetvli zero, a0, e32, m1, ta, ma
        vmsgtu.vx       v0, v8, a1
        ret

https://riscv.godbolt.org/z/63rTTanMY

Additionally, for a dynamic value known to not be the minimum value for the given integer type, it would presumably be better to decrement it in the GPRs than doing the mask negation, e.g. https://riscv.godbolt.org/z/14qPd643P (esp. if the decrement can be hoisted out of a loop).

llvmbot commented 1 week ago

@llvm/issue-subscribers-backend-risc-v

Author: dzaima (dzaima)

The following two functions have identical behavior: ```c #include<riscv_vector.h> vbool32_t vmsgeu(vuint32m1_t op1, size_t vl) { return __riscv_vmsgeu_vx_u32m1_b32(op1, 100, vl); } vbool32_t vmsgtu(vuint32m1_t op1, size_t vl) { return __riscv_vmsgtu_vx_u32m1_b32(op1, 99, vl); } ``` but clang generates different code for them, less efficient for the vmsgeu case: ```asm vmsgeu: li a1, 100 vsetvli zero, a0, e32, m1, ta, ma vmsltu.vx v8, v8, a1 vmnot.m v0, v8 ret vmsgtu: li a1, 99 vsetvli zero, a0, e32, m1, ta, ma vmsgtu.vx v0, v8, a1 ret ``` https://riscv.godbolt.org/z/63rTTanMY Additionally, for a dynamic value known to not be the minimum value for the given integer type, it would presumably be better to decrement it in the GPRs than doing the mask negation, e.g. https://riscv.godbolt.org/z/14qPd643P (esp. if the decrement can be hoisted out of a loop).

jacquesguan commented 1 week ago

Should exclude the minimum value when signed. I will create a PR to improve the selection.

llvm / llvm-project

[RISCV] vmsge_vx intrinsics with a large immediate should do vmsgt(x,imm-1) #114505