llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.65k stars 11.84k forks source link

[AArch64] Crash in aarch64 backend when compiling vsri/vcvtfxs2fp intrinsics in certain pattern #55417

Closed Benjins closed 2 years ago

Benjins commented 2 years ago

Using ARM Neon intrinsics in a certain pattern causes the compiler backend to crash on an invalid dag, and in debug mode causes an assertion earlier in the process

Reduced original C++ that triggered the issue when compiled with clang on armv8 and -O1: (Godbolt link: https://godbolt.org/z/MEvPj1EWf )

#include <arm_neon.h>
float64_t do_stuff(const double* dVals) {
    float64x1_t var0 = vld1_f64((const float64_t*)&dVals[0]);
    float64x1_t var1 = vrndi_f64(var0);
    float64_t var2 = vget_lane_f64(var1, 0);
    int64_t var3 = vcvtd_s64_f64(var2);
    int64_t var4 = vsrid_n_s64(var3, var3, 1);
    float64_t var5 = vcvtd_n_f64_s64(var4, 1);
    return var5;
}

which in a Release build gives:

fatal error: error in backend: Cannot select: intrinsic %llvm.aarch64.neon.vcvtfxs2fp

Full error log

this can be reduced to the following IR: (Godbolt link: https://godbolt.org/z/dGG6474P4 )

; Function Attrs: argmemonly mustprogress nofree nosync nounwind readonly willreturn uwtable
define dso_local noundef double @do_stuff(ptr nocapture noundef readnone %iVals, ptr nocapture noundef readnone %fVals, ptr nocapture noundef readonly %dVals) local_unnamed_addr #0 {
entry:
  %arrayidx = getelementptr inbounds double, ptr %dVals, i64 16
  %0 = load <1 x double>, ptr %arrayidx, align 8
  %vrndi_v1.i = call <1 x double> @llvm.nearbyint.v1f64(<1 x double> %0) #3
  %vget_lane = extractelement <1 x double> %vrndi_v1.i, i64 0
  %vcvtd_s64_f64.i = call i64 @llvm.aarch64.neon.fcvtzs.i64.f64(double %vget_lane) #3
  %1 = insertelement <1 x i64> poison, i64 %vcvtd_s64_f64.i, i64 0
  %vsrid_n_s647 = call <1 x i64> @llvm.aarch64.neon.vsri.v1i64(<1 x i64> %1, <1 x i64> %1, i32 1)
  %2 = extractelement <1 x i64> %vsrid_n_s647, i64 0
  %vcvtd_n_f64_s64 = call double @llvm.aarch64.neon.vcvtfxs2fp.f64.i64(i64 %2, i32 1)
  ret double %vcvtd_n_f64_s64
}

; Function Attrs: mustprogress nocallback nofree nosync nounwind readnone willreturn
declare <1 x i64> @llvm.aarch64.neon.vsri.v1i64(<1 x i64>, <1 x i64>, i32) #1

; Function Attrs: mustprogress nocallback nofree nosync nounwind readnone willreturn
declare double @llvm.aarch64.neon.vcvtfxs2fp.f64.i64(i64, i32) #1

; Function Attrs: mustprogress nocallback nofree nosync nounwind readnone speculatable willreturn
declare <1 x double> @llvm.nearbyint.v1f64(<1 x double>) #2

; Function Attrs: mustprogress nocallback nofree nosync nounwind readnone willreturn
declare i64 @llvm.aarch64.neon.fcvtzs.i64.f64(double) #1

The error about the vcvtfxs2fp intrinsic seems to be a downstream issue, since running llc in Debug mode gives the following assertion:

Assertion failed: Vec.getValueSizeInBits() == 128 && "unexpected vector size on extract_vector_elt!", file llvm-project\llvm\lib\Target\AArch64\AArch64ISelLowering.cpp, line 15025

at

>   llc.exe!tryCombineFixedPointConvert(llvm::SDNode * N, llvm::TargetLowering::DAGCombinerInfo & DCI, llvm::SelectionDAG & DAG) Line 15024 C++
    llc.exe!performIntrinsicCombine(llvm::SDNode * N, llvm::TargetLowering::DAGCombinerInfo & DCI, const llvm::AArch64Subtarget * Subtarget) Line 15999 C++
    llc.exe!llvm::AArch64TargetLowering::PerformDAGCombine(llvm::SDNode * N, llvm::TargetLowering::DAGCombinerInfo & DCI) Line 18773    C++
    llc.exe!`anonymous namespace'::DAGCombiner::combine(llvm::SDNode * N) Line 1787 C++
    llc.exe!`anonymous namespace'::DAGCombiner::Run(llvm::CombineLevel AtLevel) Line 1574   C++
    llc.exe!llvm::SelectionDAG::Combine(llvm::CombineLevel Level, llvm::AAResults * AA, llvm::CodeGenOpt::Level OptLevel) Line 24699    C++
    llc.exe!llvm::SelectionDAGISel::CodeGenAndEmitDAG() Line 917    C++

The DAG at that point:

SelectionDAG has 19 nodes:
  t8: v1i64 = BUILD_VECTOR Constant:i64<0>
    t0: ch = EntryToken
                  t19: f64 = fnearbyint ConstantFP:f64<0.000000e+00>
                t20: v1f64 = BUILD_VECTOR t19
              t5: f64 = extract_vector_elt t20, Constant:i64<0>
            t7: i64 = llvm.aarch64.neon.fcvtzs TargetConstant:i64<474>, t5
          t9: v1i64 = insert_vector_elt t8, t7, Constant:i64<0>
        t12: v1i64 = llvm.aarch64.neon.vsri TargetConstant:i64<630>, t8, t9, Constant:i32<1>
      t13: i64 = extract_vector_elt t12, Constant:i64<0>
    t15: f64 = llvm.aarch64.neon.vcvtfxs2fp TargetConstant:i64<626>, t13, Constant:i32<1>
  t17: ch,glue = CopyToReg t0, Register:f64 $d0, t15
  t18: ch = AArch64ISD::RET_FLAG t17, Register:f64 $d0, t17:1

which does initially appear to be invalid due to extract_vector_elt's args being only 64-bit

I have verified that this still repros on the latest trunk (b1aed14bfea07508e4b9d864168c1ae6b5b5c665)

For context: this code was produced by a fuzzer to test codegen, it was not manually written

llvmbot commented 2 years ago

@llvm/issue-subscribers-backend-aarch64

davemgreen commented 2 years ago

Thanks for the report, it seemed to be an incorrect assumption in tryCombineFixedPointConvert, so I fixed it above. Let us know if/when you find any more.

Benjins commented 2 years ago

Yup, seems to be fixed now. Thanks!