llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.65k stars 11.84k forks source link

RISC-V: RVV register allocation problem causes costly and unecessary spill #113489

Open camel-cdr opened 3 hours ago

camel-cdr commented 3 hours ago

Hi, I ran into a problem, where reordering a single RVV intrinsic without changing the program logic caused llvm to spill a vector register, suggesting that the register allocation has trouble reordering in this case:

#include <riscv_vector.h>

void test(int *out, const int *in, size_t n)
{
    for (size_t vl; n > 0; n -= vl, out += vl, in += vl) {
        vl = __riscv_vsetvl_e32m8(n);
        vint32m8_t v1 = __riscv_vle32_v_i32m8(in, vl);
        vint32m8_t v2 = __riscv_vadd(v1, v1, vl);
        vbool4_t mlt = __riscv_vmslt(v1, 0, vl);

#ifdef REORDER
    vint32m8_t v4 = __riscv_vmerge(v2, v1, mlt, vl);
#endif
        vint32m8_t v3 = __riscv_vadd(v1, 3, vl);
#ifndef REORDER
        vint32m8_t v4 = __riscv_vmerge(v2, v1, mlt, vl);
#endif

        vbool4_t mgt = __riscv_vmsgt(v1, 4, vl);
        v1 = __riscv_vadd_mu(__riscv_vmor(mlt, mgt, vl), v1, v3, v4, vl);

        __riscv_vse32(out, v1, vl);
    }
}

See also the godbolt link: https://godbolt.org/z/6vdf4vEjn

This example was adapted from real code, and minimized while still retaining the problematic behavior.

gcc manages to figure out the proper register allocation.

llvmbot commented 2 hours ago

@llvm/issue-subscribers-backend-risc-v

Author: Camel Coder (camel-cdr)

Hi, I ran into a problem, where reordering a single RVV intrinsic without changing the program logic caused llvm to spill a vector register, suggesting that the register allocation has trouble reordering in this case: ```c #include <riscv_vector.h> void test(int *out, const int *in, size_t n) { for (size_t vl; n > 0; n -= vl, out += vl, in += vl) { vl = __riscv_vsetvl_e32m8(n); vint32m8_t v1 = __riscv_vle32_v_i32m8(in, vl); vint32m8_t v2 = __riscv_vadd(v1, v1, vl); vbool4_t mlt = __riscv_vmslt(v1, 0, vl); #ifdef REORDER vint32m8_t v4 = __riscv_vmerge(v2, v1, mlt, vl); #endif vint32m8_t v3 = __riscv_vadd(v1, 3, vl); #ifndef REORDER vint32m8_t v4 = __riscv_vmerge(v2, v1, mlt, vl); #endif vbool4_t mgt = __riscv_vmsgt(v1, 4, vl); v1 = __riscv_vadd_mu(__riscv_vmor(mlt, mgt, vl), v1, v3, v4, vl); __riscv_vse32(out, v1, vl); } } ``` See also the godbolt link: https://godbolt.org/z/6vdf4vEjn This example was adapted from real code, and minimized while still retaining the problematic behavior. gcc manages to figure out the proper register allocation.