llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.77k stars 11.89k forks source link

[RISCV] Selection of VL for splat constants causing vsetvli toggles #55615

Open preames opened 2 years ago

preames commented 2 years ago

I noticed that we unconditionally treating splat constants as effecting all lanes in a vector register.

Example:

define void @vector_splat_toggle(double* %a, double* %b) {
entry:
  %addr = bitcast double* %a to <vscale x 1 x double>*
  tail call void @llvm.riscv.vse.nxv1f64.i64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double>* %addr, i64 4)
  %addr2 = bitcast double* %b to <vscale x 1 x double>*
  tail call void @llvm.riscv.vse.nxv1f64.i64(<vscale x 1 x double> zeroinitializer, <vscale x 1 x double>* %addr2, i64 4)
  ret void
}

; Function Attrs: nounwind writeonly
declare void @llvm.riscv.vse.nxv1f64.i64(<vscale x 1 x double>, <vscale x 1 x double>* nocapture, i64)

Repro command:

$ ./llc -march=riscv64 -mattr=+v < debug-splat.ll 

Key output:

    vsetvli a2, zero, e64, m1, ta, mu
    vmv.v.i v8, 0
    vsetivli    zero, 4, e64, m1, ta, mu
    vse64.v v8, (a0)
    vse64.v v8, (a1)

As you can see here, we setup VL to be VLMAX for the splat, despite the fact we only use four lanes of the resulting value. We could have used AVL=4 the whole way through on this example.

This example is written with scalable vectors, but this also shows up in idiomatic fixed length vector loops. As an example, see @vector_init_vsetvli_fv from test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll.

llvmbot commented 2 years ago

@llvm/issue-subscribers-backend-risc-v

asi-sc commented 2 years ago

I'm working on this bug. It seems that we can slightly modify RISCVDAGToDAGISel::PreprocessISelDAG not to always use X0 for splats, but sometimes choose VL more accurately if we have common VL for the users of the node.

asi-sc commented 2 years ago

Created review https://reviews.llvm.org/D130895 . @preames, could you please share the origin of the example from the issue if it is possible? I'd like to test my changes on it as well.