llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.77k stars 11.9k forks source link

[AArch64][SVE] Error in masked_gather_v2f16 when enabling SVE for 128-bit target #56412

Open MattPD opened 2 years ago

MattPD commented 2 years ago

Encountering an ICE after removing NEON preference in useSVEForFixedLengthVectors for 128-bit vector register size SVE code generation.

Context: There's currently a >= 256 vector size restriction in useSVEForFixedLengthVectors (in "llvm/lib/Target/AArch64/AArch64Subtarget.h").

bool useSVEForFixedLengthVectors() const {
  // Prefer NEON unless larger SVE registers are available.
  return hasSVE() && getMinSVEVectorSizeInBits() >= 256;
}

As an experiment I've relaxed it in two different ways, by changing the line in question to return hasSVE() && getMinSVEVectorSizeInBits() >= 128; or return hasSVE(); (with the same effect).

I've then run the build of the modified compiler on the AArch64 LIT tests, encountering an ICE for the following test: https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AArch64/sve-fixed-length-masked-gather.ll

Compilation using either clang or llc together with the assumed 128-bit SVE register size is sufficient to trigger the ICE:

clang -msve-vector-bits=128 -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll
llc -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll

This function alone is sufficient to trigger the ICE:

target triple = "aarch64-unknown-linux-gnu"

define void @masked_gather_v2f16(<2 x half>* %a, <2 x half*>* %b) vscale_range(2,0) #0 {
  %cval = load <2 x half>, <2 x half>* %a
  %ptrs = load <2 x half*>, <2 x half*>* %b
  %mask = fcmp oeq <2 x half> %cval, zeroinitializer
  %vals = call <2 x half> @llvm.masked.gather.v2f16(<2 x half*> %ptrs, i32 8, <2 x i1> %mask, <2 x half> undef)
  store <2 x half> %vals, <2 x half>* %a
  ret void
}

declare <2 x half> @llvm.masked.gather.v2f16(<2 x half*>, i32, <2 x i1>, <2 x half>)

attributes #0 = { "target-features"="+sve" }

Removing the vscale_range(2,0) attribute or changing it to vscale_range(1,0) has no impact (i.e., the ICE still occurs).

Here's the output from the ICE in question (note the cyclic pattern of function calls in SelectionDAG):

$ clang -msve-vector-bits=128 -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /llvm_build/bin/clang-15 -cc1 -triple aarch64-unknown-linux-gnu -emit-obj -mrelax-all --mrelax-relocations -disable-free -clear-ast-before-backend -main-file-name sve-fixed-length-masked-gather.masked_gather_v2f16.only.ll -mrelocation-model pic -pic-level 2 -pic-is-pie -mframe-pointer=non-leaf -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -funwind-tables=2 -target-cpu neoverse-n2 -target-feature +v8.5a -target-feature +crc -target-feature +lse -target-feature +rdm -target-feature +crypto -target-feature +dotprod -target-feature +fp-armv8 -target-feature +neon -target-feature +fullfp16 -target-feature +ras -target-feature +sve -target-feature +sve2 -target-feature +sve2-bitperm -target-feature +rcpc -target-feature +mte -target-feature +ssbs -target-feature +sb -target-feature +bf16 -target-feature +i8mm -target-feature +fp16fml -target-feature +sm4 -target-feature +sha3 -target-feature +sha2 -target-feature +aes -target-abi aapcs -mvscale-max=1 -mvscale-min=1 -fallow-half-arguments-and-returns -mllvm -treat-scalable-fixed-error-as-warning -debugger-tuning=gdb -fcoverage-compilation-dir=/llvm_src -resource-dir /llvm_build/lib/clang/15.0.0 -fdebug-compilation-dir=/llvm_src -ferror-limit 19 -fno-signed-char -fgnuc-version=4.2.1 -fcolor-diagnostics -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o /tmp/sve-fixed-length-masked-gather-a81266.o -x ir sve-fixed-length-masked-gather.masked_gather_v2f16.only.ll
1.      Code generation
2.      Running pass 'Function Pass Manager' on module 'sve-fixed-length-masked-gather.masked_gather_v2f16.only.ll'.
3.      Running pass 'AArch64 Instruction Selection' on function '@masked_gather_v2f16'
  #0 0x0000ffff784b83bc llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /llvm_src/llvm-project/llvm/lib/Support/Unix/Signals.inc:573:3
  #1 0x0000ffff784b69f8 llvm::sys::RunSignalHandlers() /llvm_src/llvm-project/llvm/lib/Support/Signals.cpp:104:18
  #2 0x0000ffff784b6ba4 SignalHandler(int) /llvm_src/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
  #3 0x0000ffff7d9216c0 (linux-vdso.so.1+0x6c0)
  #4 0x0000ffff7792dd60 llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) (.part.0) /llvm_src/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:501:9
  #5 0x0000ffff7792dcf8 llvm::SDNode::getNodeId() const /llvm_src/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:713:34
  #6 0x0000ffff7792dcf8 llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /llvm_src/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:572:31
  #7 0x0000ffff7792de08 llvm::SDValue::getNode() const /llvm_src/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:151:36
  #8 0x0000ffff7792de08 llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) (.part.0) /llvm_src/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:525:32
  #9 0x0000ffff7792dcf8 llvm::SDNode::getNodeId() const /llvm_src/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:713:34
 #10 0x0000ffff7792dcf8 llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /llvm_src/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:572:31
 #11 0x0000ffff7792de08 llvm::SDValue::getNode() const /llvm_src/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:151:36
 #12 0x0000ffff7792de08 llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) (.part.0) /llvm_src/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:525:32
 #13 0x0000ffff7792dcf8 llvm::SDNode::getNodeId() const /llvm_src/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:713:34
 #14 0x0000ffff7792dcf8 llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /llvm_src/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:572:31
 #15 0x0000ffff7792de08 llvm::SDValue::getNode() const /llvm_src/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:151:36
. . .
#500 0x0000ffff7792de08 llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) (.part.0) /llvm_src/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:525:32
#501 0x0000ffff7792dcf8 llvm::SDNode::getNodeId() const /llvm_src/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:713:34
#502 0x0000ffff7792dcf8 llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /llvm_src/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:572:31
#503 0x0000ffff7792de08 llvm::SDValue::getNode() const /llvm_src/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:151:36
#504 0x0000ffff7792de08 llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) (.part.0) /llvm_src/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:525:32
#505 0x0000ffff7792dcf8 llvm::SDNode::getNodeId() const /llvm_src/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:713:34
#506 0x0000ffff7792dcf8 llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /llvm_src/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:572:31
clang-15: error: unable to execute command: Segmentation fault (core dumped)
clang-15: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 15.0.0 (https://github.com/llvm/llvm-project.git ac3e26bcffa29d3519f87be678ad09431a6bf6f2)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /llvm_build/bin
clang-15: note: diagnostic msg: Error generating preprocessed source(s) - no preprocessable inputs.

I'm wondering, would you happen to know whether SVE code generation for the targets with 128-bit SVE registers is meant to be supported--and, possibly, would modifying useSVEForFixedLengthVectors as above be the proper way to go about it or could there be any remaining checks that need to be changed, e.g., in AArch64TargetLowering::useSVEForFixedLengthVectorVT (https://github.com/llvm/llvm-project/blob/6f4773f06428d16cb4716e9d1ba590d8c2ff7596/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L5628-L5630)?

// Ensure NEON MVTs only belong to a single register class.
if (VT.getFixedSizeInBits() <= 128)

cc @paulwalker-arm @sdesmalen-arm @stevesuzuki-arm (in case it's relevant to https://github.com/halide/Halide/pull/6781)

llvmbot commented 2 years ago

@llvm/issue-subscribers-backend-aarch64

paulwalker-arm commented 2 years ago

I'll have to investigate to confirm details but in general SVE VLS code generation of 128bit and smaller vectors is not well tested and typically only enabled for a handful (but not all) of the cases where SVE has a benefit over NEON. The lowering is controlled via useSVEForFixedLengthVectorVT which takes a flag to say whether it should also return true for NEON sized (i.e. 128 or 64 bit) vectors.

With that said, here we're talking about <2 x half> vectors which are not currently considered type legal for NEON or SVE and so were running into legalisation issues before we get to the phase that would lower to SVE. Looking at the backtrace I'm wondering if we're running out of stack space due to infinite recursion?