nod-ai / iree-amd-aie

IREE plugin repository for the AMD AIE accelerator
Apache License 2.0
46 stars 23 forks source link

matmul-elementwise bf16 model failed compilation #372

Open yzhang93 opened 1 month ago

yzhang93 commented 1 month ago

Input IR

!lhs = tensor<1024x512xbf16>
!rhs = tensor<512x1024xbf16>
!ele = tensor<1024x1024xf32>
!res = tensor<1024x1024xbf16>

func.func @matmul_elementwise_bf16(%lhs : !lhs, %rhs : !rhs, %ele : !ele) -> !res {
  %cst = arith.constant 0.0 : f32
  %0 = tensor.empty() : !ele
  %1 = tensor.empty() : !res
  %fill = linalg.fill ins(%cst : f32) outs(%0 : !ele) -> !ele
  %2 = linalg.matmul ins(%lhs, %rhs : !lhs, !rhs) outs(%fill : !ele) -> !ele
  %res = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%2, %ele : !ele, !ele) outs(%1 : !res) {
  ^bb0(%in: f32, %in_0: f32, %out: bf16):
    %11 = arith.addf %in, %in_0 : f32
    %12 = arith.truncf %11 : f32 to bf16
    linalg.yield %12 : bf16
  } -> !res
  return %res : !res
}

Error:

LLVM ERROR: unable to legalize instruction: %1730:_(<1024 x s16>) = G_SHUFFLE_VECTOR %1729:_(<1024 x s16>), %1475:_, shufflemaskin function: core_0_2)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.  Program arguments: /proj/xsjhdstaff4/vivizhan/llvm-aie/install/bin/llc /proj/xsjhdstaff4/vivizhan/iree-amd-aie/build_tools/ci/cpu_comparison/test_result_bf16/module_matmul_elementwise_bf16_dispatch_0_amdaie_xclbin_fb/input.opt.ll -O2 --march=aie2 --function-sections --filetype=obj -o /proj/xsjhdstaff4/vivizhan/iree-amd-aie/build_tools/ci/cpu_comparison/test_result_bf16/module_matmul_elementwise_bf16_dispatch_0_amdaie_xclbin_fb/input.o
1.  Running pass 'Function Pass Manager' on module '/proj/xsjhdstaff4/vivizhan/iree-amd-aie/build_tools/ci/cpu_comparison/test_result_bf16/module_matmul_elementwise_bf16_dispatch_0_amdaie_xclbin_fb/input.opt.ll'.
2.  Running pass 'Legalizer' on function '@core_0_2'
 #0 0x000055ae9b6ceebf llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/Support/Unix/Signals.inc:567:22
 #1 0x000055ae9b6ccfc4 llvm::sys::RunSignalHandlers() /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/Support/Signals.cpp:104:20
 #2 0x000055ae9b6cd146 SignalHandler(int) /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/Support/Unix/Signals.inc:412:1
 #3 0x00007fa6da842520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007fa6da8969fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007fa6da8969fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #6 0x00007fa6da8969fc pthread_kill ./nptl/pthread_kill.c:89:10
 #7 0x00007fa6da842476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x00007fa6da8287f3 abort ./stdlib/abort.c:81:7
 #9 0x000055ae9b6438d3 (/proj/xsjhdstaff4/vivizhan/llvm-aie/install/bin/llc+0x2cd98d3)
#10 0x000055ae9bb25532 reportGISelDiagnostic(llvm::DiagnosticSeverity, llvm::MachineFunction&, llvm::TargetPassConfig const&, llvm::MachineOptimizationRemarkEmitter&, llvm::MachineOptimizationRemarkMissed&) /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/CodeGen/GlobalISel/Utils.cpp:257:23
#11 0x000055ae9bb26f5b llvm::DiagnosticInfoOptimizationBase::~DiagnosticInfoOptimizationBase() /proj/rdi/staff/vivizhan/llvm-aie/llvm/include/llvm/IR/DiagnosticInfo.h:413:7
#12 0x000055ae9bb26f5b llvm::DiagnosticInfoMIROptimization::~DiagnosticInfoMIROptimization() /proj/rdi/staff/vivizhan/llvm-aie/llvm/include/llvm/CodeGen/MachineOptimizationRemarkEmitter.h:30:7
#13 0x000055ae9bb26f5b llvm::MachineOptimizationRemarkMissed::~MachineOptimizationRemarkMissed() /proj/rdi/staff/vivizhan/llvm-aie/llvm/include/llvm/CodeGen/MachineOptimizationRemarkEmitter.h:84:7
#14 0x000055ae9bb26f5b llvm::reportGISelFailure(llvm::MachineFunction&, llvm::TargetPassConfig const&, llvm::MachineOptimizationRemarkEmitter&, char const*, llvm::StringRef, llvm::MachineInstr const&) /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/CodeGen/GlobalISel/Utils.cpp:286:1
#15 0x000055ae9babdb82 llvm::Legalizer::runOnMachineFunction(llvm::MachineFunction&) (.part.0) /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/CodeGen/GlobalISel/Legalizer.cpp:348:12
#16 0x000055ae9a7f9b3b llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/CodeGen/MachineFunctionPass.cpp:91:33
#17 0x000055ae9ad2eaec llvm::FPPassManager::runOnFunction(llvm::Function&) /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/IR/LegacyPassManager.cpp:1440:7
#18 0x000055ae9ad2ed19 llvm::ilist_node_base<true>::getNext() const /proj/rdi/staff/vivizhan/llvm-aie/llvm/include/llvm/ADT/ilist_node_base.h:43:45
#19 0x000055ae9ad2ed19 llvm::ilist_node_impl<llvm::ilist_detail::node_options<llvm::Function, true, false, void>>::getNext() /proj/rdi/staff/vivizhan/llvm-aie/llvm/include/llvm/ADT/ilist_node.h:67:66
#20 0x000055ae9ad2ed19 llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Function, true, false, void>, false, false>::operator++() /proj/rdi/staff/vivizhan/llvm-aie/llvm/include/llvm/ADT/ilist_iterator.h:157:25
#21 0x000055ae9ad2ed19 llvm::FPPassManager::runOnModule(llvm::Module&) /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/IR/LegacyPassManager.cpp:1475:22
#22 0x000055ae9ad2f59e runOnModule /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/IR/LegacyPassManager.cpp:1552:7
#23 0x000055ae9ad2f59e llvm::legacy::PassManagerImpl::run(llvm::Module&) /proj/rdi/staff/vivizhan/llvm-aie/llvm/lib/IR/LegacyPassManager.cpp:535:55
#24 0x000055ae99e4601e compileModule(char**, llvm::LLVMContext&) /proj/rdi/staff/vivizhan/llvm-aie/llvm/tools/llc/llc.cpp:736:66
#25 0x000055ae99e46f86 main /proj/rdi/staff/vivizhan/llvm-aie/llvm/tools/llc/llc.cpp:420:35
#26 0x00007fa6da829d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#27 0x00007fa6da829e40 call_init ./csu/../csu/libc-start.c:128:20
#28 0x00007fa6da829e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#29 0x000055ae99e3a2e5 _start (/proj/xsjhdstaff4/vivizhan/llvm-aie/install/bin/llc+0x14d02e5)
yzhang93 commented 1 month ago

In contrast, bf16-f32 model (without arith.truncf %11 : f32 to bf16) as below doesn't have such error.

!lhs = tensor<1024x512xbf16>
!rhs = tensor<512x1024xbf16>
!ele = tensor<1024x1024xf32>
!res = tensor<1024x1024xf32>

func.func @matmul_elementwise_bf16(%lhs : !lhs, %rhs : !rhs, %ele : !ele) -> !res {
  %cst = arith.constant 0.0 : f32
  %0 = tensor.empty() : !ele
  %fill = linalg.fill ins(%cst : f32) outs(%0 : !ele) -> !ele
  %2 = linalg.matmul ins(%lhs, %rhs : !lhs, !rhs) outs(%fill : !ele) -> !ele
  %res = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%2, %ele : !ele, !ele) outs(%0 : !ele) {
  ^bb0(%in: f32, %in_0: f32, %out: f32):
    %11 = arith.addf %in, %in_0 : f32
    linalg.yield %11 : f32
  } -> !res
  return %res : !res
}

@MaheshRavishankar @stephenneuendorffer @newling @erwei-xilinx Any insight about the issue?

MaheshRavishankar commented 1 month ago

I dont know if Peano handles bf16 natively.

stephenneuendorffer commented 1 month ago

I believe there's work going on to implement shuffle_vector. currently the assumption is that the vector ops always go through intrinsics. FYI, for Peano issues, you're better off capturing the .ll code and creating an issue in the peano repo.

gbossu commented 1 month ago

Peano does support bf16 types, and there is indeed work to support more and more cases of generic shuffle_vector. However, I think the problem here is rather that %1730:_(<1024 x s16>) is a huge vector, and we do not have the capability yet to properly legalize those. As Stephen said, it would be very useful if you could get us a small .ll reproducer, then we can investigate what's really happening here :)

ValentijnvdBeek commented 1 month ago

Support for G_SHUFFLE_VECTOR for Peano is soon under review, so that should land soonish. The failing instruction asks for 16-bit so it is not the support for bf in any case. There are two problems with the code as is:

yzhang93 commented 1 month ago

Thanks @stephenneuendorffer @gbossu @ValentijnvdBeek for looking into the issue! Here are the .ll files generated from the above example. Please let me know if you need me to provide other sources. input_ll.zip