[LoopVectorize][VPlan] Assertion `MinBWs.size() == NumProcessedRecipes && "some entries in MinBWs haven't been processed"' failed.

patrick-rivos commented 5 months ago

Reduced LLVM IR:

target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "riscv64-unknown-linux-gnu"

define i32 @main() #0 {
entry:
  %conv21.us.i = sext i16 0 to i32
  br label %for.cond3.preheader.us.i

for.cond3.preheader.us.i:                         ; preds = %for.cond3.preheader.us.i, %entry
  %indvars.iv.i = phi i64 [ %indvars.iv.next.i, %for.cond3.preheader.us.i ], [ 0, %entry ]
  %add67.lcssa7984.us.i = phi i8 [ %2, %for.cond3.preheader.us.i ], [ 0, %entry ]
  %.conv21.us99.i = tail call i32 @llvm.smax.i32(i32 0, i32 %conv21.us.i)
  %cmp35.us100.i = icmp eq i32 %.conv21.us99.i, 0
  %conv36.us101.i = zext i1 %cmp35.us100.i to i32
  %0 = lshr i32 %conv36.us101.i, 1
  %1 = trunc i32 %0 to i8
  %2 = or i8 %add67.lcssa7984.us.i, %1
  %indvars.iv.next.i = add i64 %indvars.iv.i, 1
  %cmp.us.i = icmp slt i64 %indvars.iv.i, 1
  br i1 %cmp.us.i, label %for.cond3.preheader.us.i, label %for.cond.for.cond.cleanup_crit_edge.split.us.i

for.cond.for.cond.cleanup_crit_edge.split.us.i:   ; preds = %for.cond3.preheader.us.i
  store i8 %2, ptr null, align 1
  ret i32 0
}

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i32 @llvm.smax.i32(i32, i32) #1

attributes #0 = { "target-features"="+64bit,+v" }
attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }

Backtrace:

opt: /scratch/tc-testing/tc-apr-2/llvm/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1061: static void llvm::VPlanTransforms::truncateToMinimalBitwidths(llvm::VPlan&, const llvm::MapVector<llvm::Instruction*, long unsigned int>&, llvm
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt --passes=loop-vectorize reduced.ll
 #0 0x0000596ae80c5b60 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt+0x2d84b60)
 #1 0x0000596ae80c2f6f llvm::sys::RunSignalHandlers() (/scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt+0x2d81f6f)
 #2 0x0000596ae80c30c5 SignalHandler(int) Signals.cpp:0:0
 #3 0x00007a0a45642520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007a0a456969fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007a0a456969fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #6 0x00007a0a456969fc pthread_kill ./nptl/pthread_kill.c:89:10
 #7 0x00007a0a45642476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x00007a0a456287f3 abort ./stdlib/abort.c:81:7
 #9 0x00007a0a4562871b _nl_load_domain ./intl/loadmsgcat.c:1177:9
#10 0x00007a0a45639e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
#11 0x0000596ae7260af2 llvm::VPlanTransforms::truncateToMinimalBitwidths(llvm::VPlan&, llvm::MapVector<llvm::Instruction*, unsigned long, llvm::DenseMap<llvm::Instruction*, unsigned int, llvm::DenseMapInfo<llvm::Instruction*, void>, llvm::detail::DenseMapPair<llvm::Instruction*, unsigned int>>, llvm::SmallVector<std::pair<llvm::Instruction*, unsigned long>, 0u>> const&, llvm::LLVMContext&) (/scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt+0x1f1faf2)
#12 0x0000596ae7135f3e llvm::LoopVectorizationPlanner::buildVPlansWithVPRecipes(llvm::ElementCount, llvm::ElementCount) (/scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt+0x1df4f3e)
#13 0x0000596ae713c319 llvm::LoopVectorizationPlanner::plan(llvm::ElementCount, unsigned int) (/scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt+0x1dfb319)
#14 0x0000596ae713f8a9 llvm::LoopVectorizePass::processLoop(llvm::Loop*) (/scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt+0x1dfe8a9)
#15 0x0000596ae714273e llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo*, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AssumptionCache&, llvm::LoopAccessInfoManager&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) (/scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt+0x1e0173e)
#16 0x0000596ae714388d llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt+0x1e0288d)
#17 0x0000596ae604a276 llvm::detail::PassModel<llvm::Function, llvm::LoopVectorizePass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt+0xd09276)
#18 0x0000596ae7eec141 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt+0x2bab141)
#19 0x0000596ae6041bd6 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt+0xd00bd6)
#20 0x0000596ae7eeae3b llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt+0x2ba9e3b)
#21 0x0000596ae6049dc6 llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt+0xd08dc6)
#22 0x0000596ae7ee8cb1 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt+0x2ba7cb1)
#23 0x0000596ae58c3ce5 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (/scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt+0x582ce5)
#24 0x0000596ae58b6316 optMain (/scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt+0x575316)
#25 0x00007a0a45629d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#26 0x00007a0a45629e40 call_init ./csu/../csu/libc-start.c:128:20
#27 0x00007a0a45629e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#28 0x0000596ae58abf95 _start (/scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt+0x56af95)
zsh: IOT instruction (core dumped)  /scratch/tc-testing/tc-apr-2/build-rv64gcv/build-llvm-linux/bin/opt

Godbolt: https://godbolt.org/z/6TxWGMz6e

Assert: https://github.com/llvm/llvm-project/blob/c403a478076a16172d9b50e16c288b0d360f42ce/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp#L1061-L1063

Found via fuzzer.

patrick-rivos commented 4 months ago

Partially cleaned up testcase:

target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "riscv64-unknown-linux-gnu"

define i32 @main() #0 {
entry:
  %sext.0 = sext i16 0 to i32
  br label %loop.preheader

loop.preheader:                         ; preds = %loop.preheader, %entry
  %iv.i = phi i64 [ %iv.next.i, %loop.preheader ], [ 0, %entry ]
  %phi.0 = phi i8 [ %or.0, %loop.preheader ], [ 0, %entry ]
  %max.0 = tail call i32 @llvm.smax.i32(i32 0, i32 %sext.0)
  %cmp = icmp eq i32 %max.0, 0
  %zext.true = zext i1 %cmp to i32
  %0 = lshr i32 %zext.true, 1
  %1 = trunc i32 %0 to i8
  %or.0 = or i8 %phi.0, %1
  %iv.next.i = add i64 %iv.i, 1
  %cmp.us.i = icmp slt i64 %iv.i, 1
  br i1 %cmp.us.i, label %loop.preheader, label %loop.exit

loop.exit:   ; preds = %loop.preheader
  store i8 %or.0, ptr null, align 1
  ret i32 0
}

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i32 @llvm.smax.i32(i32, i32) #1

attributes #0 = { "target-features"="+64bit,+v" }

Basic analysis:

The issue is caused by: https://github.com/llvm/llvm-project/blob/39bfdb7f33f7f53ab662c5cea25129c45a9b4c11/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp#L1074-L1076 when %sext.0 = sext i16 0 to i32 is the only unaccounted for MinBWs entry.

Op->getLiveInIRValue() returns a Constant i32 0, so the dynamic cast to Instruction returns a null pointer which does not exist in MinBWs.

I think the Constant is an optimized sext i16 0 to i32 so I'll try to find where the constant comes from next.

patrick-rivos commented 4 months ago

Alright I think I understand everything that's going on here

Using this testcase:

target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"

define i32 @main() #1 {
entry:
  %zext.0 = zext i8 0 to i64
  br label %loop

loop:                             ; preds = %loop, %entry
  %phi.0 = phi i64 [ %incrementor, %loop ], [ 0, %entry ]
  %incrementor = add i64 %phi.0, 1
  %max.0 = tail call i64 @llvm.umax.i64(i64 %zext.0, i64 0)
  %cmp.0 = icmp ne i64 %max.0, 0
  %zext.1 = zext i1 %cmp.0 to i64
  %trunc.0 = trunc i64 %zext.1 to i32
  %shl.0 = shl i32 %trunc.0, 8 ; Truncate and shift to make all bits dead
  %trunc.1 = trunc i32 %shl.0 to i8
  %exitcond6 = icmp ne i64 %phi.0, 16
  br i1 %exitcond6, label %loop, label %loop.exit

loop.exit:                           ; preds = %loop
  store i8 %trunc.1, ptr null, align 1
  ret i32 0
}

attributes #1 = { "target-features"="+v" }

computeMinimumValueSizes operates on a region, but a chain terminated with one instruction outside the bounds of the region is still considered valid. This compounded with weirdness with WIDEN-CALL causes issues in truncateToMinimalBitwidths where this range is considered:

<x1> vector loop: {
  vector.body:
    EMIT vp<%2> = CANONICAL-INDUCTION ir<0>, vp<%3>
    WIDEN-INDUCTION %phi.0 = phi %incrementor, 0, ir<1>
    CLONE ir<%incrementor> = add ir<%phi.0>, ir<1>
    WIDEN-CALL ir<%max.0> = call @llvm.umax.i64(ir<%zext.0>, ir<0>) (using vector intrinsic)
    WIDEN ir<%cmp.0> = icmp ne ir<%max.0>, ir<0>
    WIDEN-CAST ir<%zext.1> = zext  ir<%cmp.0> to i64
    WIDEN-CAST ir<%trunc.0> = trunc  ir<%zext.1> to i32
    WIDEN ir<%shl.0> = shl ir<%trunc.0>, ir<8>
    WIDEN-CAST ir<%trunc.1> = trunc  ir<%shl.0> to i8
    CLONE ir<%exitcond6> = icmp ne ir<%phi.0>, ir<16>
  Successor(s):

  :
  Successor(s): vector.latch

  vector.latch:
    EMIT vp<%3> = add nuw vp<%2>, vp<%0>
    EMIT branch-on-count vp<%3>, vp<%1>
  No successors
}
Successor(s): middle.block

The MinBWs are:

%cmp.0 = icmp ne i64 %max.0, 0 Size: 1
%zext.0 = zext i8 0 to i64 Size: 1

comp.0 is found easily but zext.0 isn't in the region. Normally chains outside the region are handled by this which pulls in arguments.

We don't consider WIDEN-CALL since it doesn't have an entry in MinBW (and isn't allowed). It doesn't have an entry since the function call uses a ptr which cannot be truncated so that op fails to truncate.

DemandedBits understands a subset of all intrinsics while VPlanAnalysis doesn't handle any of them (for the pointer reason).

Fix

This can be fixed by ignoring the call function pointer argument for all intrinsic cases that DemandedBits handles.

VPWidenCallRecipe also needs a way to specify a new return type when executed so I added getResultType and setResultType methods.

I'll submit a PR soon.

llvm / llvm-project

[LoopVectorize][VPlan] Assertion `MinBWs.size() == NumProcessedRecipes && "some entries in MinBWs haven't been processed"' failed. #87407

Fix