nod-ai / iree-amd-aie

IREE plugin repository for the AMD AIE accelerator
Apache License 2.0
46 stars 23 forks source link

matmul-transpose-b bf16 model failed with llvm G_SHUFFLE_VECTOR error #387

Closed yzhang93 closed 1 month ago

yzhang93 commented 1 month ago

The input IR:

#executable_target_amdaie_xclbin_fb = #hal.executable.target<"amd-aie", "amdaie-xclbin-fb", {target_arch = "chip-tbd", ukernels = "none"}>
#device_target_amd_aie = #hal.device.target<"amd-aie", [#executable_target_amdaie_xclbin_fb]>
module attributes {hal.device.targets = [#device_target_amd_aie]} {
  func.func @matmul_128x512_256xbf16_(%arg0: tensor<128x512xbf16>, %arg1: tensor<256x512xbf16>) -> tensor<128x256xf32> {
    %0 = tensor.empty() : tensor<128x256xf32>
    %cst = arith.constant 0.000000e+00 : f32
    %1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<128x256xf32>) -> tensor<128x256xf32>
    %2 = linalg.matmul_transpose_b ins(%arg0, %arg1 : tensor<128x512xbf16>, tensor<256x512xbf16>) outs(%1 : tensor<128x256xf32>) -> tensor<128x256xf32>
    return %2 : tensor<128x256xf32>
  }
}

The error:

LLVM ERROR: unable to legalize instruction: %353:_(<32 x s16>) = G_SHUFFLE_VECTOR %352:_(<32 x s16>), %223:_, shufflemask(0, 8, 16, 24, 1, 9, 17, 25, 2, 10, 18, 26, 3, 11, 19, 27, 4, 12, 20, 28, 5, 13, 21, 29, 6, 14, 22, 30, 7, 15, 23, 31) (in function: core_0_2)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.  Program arguments: /proj/xsjhdstaff4/vivizhan/llvm-aie/install/bin/llc /proj/xsjhdstaff4/vivizhan/iree-amd-aie/build_tools/ci/test_results/module_matmul_128x512_256xbf16__dispatch_0_amdaie_xclbin_fb/input.opt.ll -O2 --march=aie2 --function-sections --filetype=obj -o /proj/xsjhdstaff4/vivizhan/iree-amd-aie/build_tools/ci/test_results/module_matmul_128x512_256xbf16__dispatch_0_amdaie_xclbin_fb/input.o
1.  Running pass 'Function Pass Manager' on module '/proj/xsjhdstaff4/vivizhan/iree-amd-aie/build_tools/ci/test_results/module_matmul_128x512_256xbf16__dispatch_0_amdaie_xclbin_fb/input.opt.ll'.
2.  Running pass 'Legalizer' on function '@core_0_2'
 #0 0x000055cf93a50ebf llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/Support/Unix/Signals.inc:567:22
 #1 0x000055cf93a4efc4 llvm::sys::RunSignalHandlers() /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/Support/Signals.cpp:104:20
 #2 0x000055cf93a4f146 SignalHandler(int) /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/Support/Unix/Signals.inc:412:1
 #3 0x00007fa190442520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007fa1904969fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007fa1904969fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #6 0x00007fa1904969fc pthread_kill ./nptl/pthread_kill.c:89:10
 #7 0x00007fa190442476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x00007fa1904287f3 abort ./stdlib/abort.c:81:7
 #9 0x000055cf939c58d3 (/proj/xsjhdstaff4/vivizhan/llvm-aie/install/bin/llc+0x2cd98d3)
#10 0x000055cf93ea7532 reportGISelDiagnostic(llvm::DiagnosticSeverity, llvm::MachineFunction&, llvm::TargetPassConfig const&, llvm::MachineOptimizationRemarkEmitter&, llvm::MachineOptimizationRemarkMissed&) /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/CodeGen/GlobalISel/Utils.cpp:257:23
#11 0x000055cf93ea8f5b llvm::DiagnosticInfoOptimizationBase::~DiagnosticInfoOptimizationBase() /proj/rdi/staff/vivizhan/llvm-aie/llvm/include/llvm/IR/DiagnosticInfo.h:413:7
#12 0x000055cf93ea8f5b llvm::DiagnosticInfoMIROptimization::~DiagnosticInfoMIROptimization() /proj/rdi/staff/vivizhan/llvm-aie/llvm/include/llvm/CodeGen/MachineOptimizationRemarkEmitter.h:30:7
#13 0x000055cf93ea8f5b llvm::MachineOptimizationRemarkMissed::~MachineOptimizationRemarkMissed() /proj/rdi/staff/vivizhan/llvm-aie/llvm/include/llvm/CodeGen/MachineOptimizationRemarkEmitter.h:84:7
#14 0x000055cf93ea8f5b llvm::reportGISelFailure(llvm::MachineFunction&, llvm::TargetPassConfig const&, llvm::MachineOptimizationRemarkEmitter&, char const*, llvm::StringRef, llvm::MachineInstr const&) /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/CodeGen/GlobalISel/Utils.cpp:286:1
#15 0x000055cf93e3fb82 llvm::Legalizer::runOnMachineFunction(llvm::MachineFunction&) (.part.0) /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/CodeGen/GlobalISel/Legalizer.cpp:348:12
#16 0x000055cf92b7bb3b llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/CodeGen/MachineFunctionPass.cpp:91:33
#17 0x000055cf930b0aec llvm::FPPassManager::runOnFunction(llvm::Function&) /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/IR/LegacyPassManager.cpp:1440:7
#18 0x000055cf930b0d19 llvm::ilist_node_base<true>::getNext() const /proj/rdi/staff/vivizhan/llvm-aie/llvm/include/llvm/ADT/ilist_node_base.h:43:45
#19 0x000055cf930b0d19 llvm::ilist_node_impl<llvm::ilist_detail::node_options<llvm::Function, true, false, void>>::getNext() /proj/rdi/staff/vivizhan/llvm-aie/llvm/include/llvm/ADT/ilist_node.h:67:66
#20 0x000055cf930b0d19 llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Function, true, false, void>, false, false>::operator++() /proj/rdi/staff/vivizhan/llvm-aie/llvm/include/llvm/ADT/ilist_iterator.h:157:25
#21 0x000055cf930b0d19 llvm::FPPassManager::runOnModule(llvm::Module&) /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/IR/LegacyPassManager.cpp:1475:22
#22 0x000055cf930b159e runOnModule /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/IR/LegacyPassManager.cpp:1552:7
#23 0x000055cf930b159e llvm::legacy::PassManagerImpl::run(llvm::Module&) /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/IR/LegacyPassManager.cpp:535:55
#24 0x000055cf921c801e compileModule(char**, llvm::LLVMContext&) /proj/rdi/staff/vivizhan/llvm-aie/llvm/tools/llc/llc.cpp:736:66
#25 0x000055cf921c8f86 main /proj/rdi/staff/vivizhan/llvm-aie/llvm/tools/llc/llc.cpp:420:35
#26 0x00007fa190429d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#27 0x00007fa190429e40 call_init ./csu/../csu/libc-start.c:128:20
#28 0x00007fa190429e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#29 0x000055cf921bc2e5 _start (/proj/xsjhdstaff4/vivizhan/llvm-aie/install/bin/llc+0x14d02e5)

Attach some generated files for debug: AIE_dump.txt input.ll.zip

Note: matmul_transpose_b i32 tests without vectorization run without problem on hardware.

yzhang93 commented 1 month ago

I've seen similar issue as reported in https://github.com/nod-ai/iree-amd-aie/issues/372. @jsetoain I saw you have some relevant PRs about aievec shuffle in MLIR-AIE. Could you help take a look to see if the generated IR are reasonable?

jsetoain commented 1 month ago

I've seen similar issue as reported in #372. @jsetoain I saw you have some relevant PRs about aievec shuffle in MLIR-AIE. Could you help take a look to see if the generated IR are reasonable?

That IR is the kind I've been targeting with my latest PRs. The only missing bit is PR1527, which is ready to land (waiting for a last review). If you want to go ahead and give it a go, it should close the last gap.

yzhang93 commented 1 month ago

That IR is the kind I've been targeting with my latest PRs. The only missing bit is PR1527, which is ready to land (waiting for a last review). If you want to go ahead and give it a go, it should close the last gap.

Cool, thanks! I'll run the example again once it's landed.

jsetoain commented 1 month ago

@yzhang93 It just did, please go ahead and let us know how it goes 🙂

yzhang93 commented 1 month ago

Thanks @jsetoain! The error is gone with the your latest PR.