iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.79k stars 603 forks source link

Segmentation fault when trying to compile GPT-3 training step #13498

Closed phoenix-meadowlark closed 1 year ago

phoenix-meadowlark commented 1 year ago

What happened?

Attempting to compile the training step dumped by PAX for a 1.3B GPT-3-like model causes iree-compile to produce the following segmentation fault:

 #0 0x00007f6116d9caa7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) iree/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:602:13
 #1 0x00007f6116d9ae70 llvm::sys::RunSignalHandlers() iree/third_party/llvm-project/llvm/lib/Support/Signals.cpp:105:18
 #2 0x00007f6116d9d12a SignalHandler(int) iree/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:413:1
 #3 0x00007f610f85af90 (/lib/x86_64-linux-gnu/libc.so.6+0x3bf90)
 #4 0x00007f61148e08b4 mlir::AbstractType::getTypeID() const iree/third_party/llvm-project/mlir/include/mlir/IR/TypeSupport.h:101:37
 #5 0x00007f61148e08b4 mlir::Type::getTypeID() iree/third_party/llvm-project/mlir/include/mlir/IR/Types.h:112:55
 #6 0x00007f61148e08b4 bool mlir::detail::StorageUserBase<mlir::OpaqueType, mlir::Type, mlir::detail::OpaqueTypeStorage, mlir::detail::TypeUniquer>::classof<mlir::Type>(mlir::Type) iree/third_party/llvm-project/mlir/include/mlir/IR/StorageUniquerSupport.h:113:16
 #7 0x00007f61148e08b4 llvm::CastInfo<mlir::OpaqueType, mlir::Type const, void>::isPossible(mlir::Type) iree/third_party/llvm-project/mlir/include/mlir/IR/Types.h:394:14
 #8 0x00007f61148e08b4 llvm::DefaultDoCastIfPossible<mlir::OpaqueType, mlir::Type const, llvm::CastInfo<mlir::OpaqueType, mlir::Type const, void> >::doCastIfPossible(mlir::Type) iree/third_party/llvm-project/llvm/include/llvm/Support/Casting.h:311:10
 #9 0x00007f61148e08b4 decltype(auto) llvm::dyn_cast<mlir::OpaqueType, mlir::Type>(mlir::Type const&) iree/third_party/llvm-project/llvm/include/llvm/Support/Casting.h:651:10
#10 0x00007f61148e08b4 mlir::OpaqueType mlir::Type::dyn_cast<mlir::OpaqueType>() const iree/third_party/llvm-project/mlir/include/mlir/IR/Types.h:312:10
#11 0x00007f61148e08b4 decltype(auto) llvm::detail::TypeSwitchBase<llvm::TypeSwitch<mlir::Type, void>, mlir::Type>::castValue<mlir::OpaqueType, mlir::Type const&>(mlir::Type const&, std::enable_if<is_detected<llvm::detail::TypeSwitchBase<llvm::TypeSwitch<mlir::Type, void>, mlir::Type>::has_dyn_cast_t, mlir::Type const&, mlir::OpaqueType>::value, void>::type*) iree/third_party/llvm-project/llvm/include/llvm/ADT/TypeSwitch.h:77:27
#12 0x00007f61148e08b4 llvm::TypeSwitch<mlir::Type, void>& llvm::TypeSwitch<mlir::Type, void>::Case<mlir::OpaqueType, mlir::AsmPrinter::Impl::printTypeImpl(mlir::Type)::$_18>(mlir::AsmPrinter::Impl::printTypeImpl(mlir::Type)::$_18&&) iree/third_party/llvm-project/llvm/include/llvm/ADT/TypeSwitch.h:168:26
#13 0x00007f61148e08b4 mlir::AsmPrinter::Impl::printTypeImpl(mlir::Type) iree/third_party/llvm-project/mlir/lib/IR/AsmPrinter.cpp:2422:8
#14 0x00007f61148dc1e1 llvm::indexed_accessor_iterator<llvm::detail::indexed_accessor_range_base<mlir::OperandRange, mlir::OpOperand*, mlir::Value, mlir::Value, mlir::Value>::iterator, mlir::OpOperand*, mlir::Value, mlir::Value, mlir::Value>::operator==(llvm::indexed_accessor_iterator<llvm::detail::indexed_accessor_range_base<mlir::OperandRange, mlir::OpOperand*, mlir::Value, mlir::Value, mlir::Value>::iterator, mlir::OpOperand*, mlir::Value, mlir::Value, mlir::Value> const&) const iree/third_party/llvm-project/llvm/include/llvm/ADT/STLExtras.h:1289:29
#15 0x00007f61148dc1e1 llvm::iterator_facade_base<llvm::detail::indexed_accessor_range_base<mlir::OperandRange, mlir::OpOperand*, mlir::Value, mlir::Value, mlir::Value>::iterator, std::random_access_iterator_tag, mlir::Value, long, mlir::Value, mlir::Value>::operator!=(llvm::detail::indexed_accessor_range_base<mlir::OperandRange, mlir::OpOperand*, mlir::Value, mlir::Value, mlir::Value>::iterator const&) const iree/third_party/llvm-project/llvm/include/llvm/ADT/iterator.h:181:51
#16 0x00007f61148dc1e1 void llvm::interleave<llvm::detail::indexed_accessor_range_base<mlir::OperandRange, mlir::OpOperand*, mlir::Value, mlir::Value, mlir::Value>::iterator, mlir::OpAsmPrinter::printFunctionalType(mlir::Operation*)::$_0, void llvm::interleave<mlir::OperandRange, mlir::OpAsmPrinter::printFunctionalType(mlir::Operation*)::$_0, llvm::raw_ostream, mlir::Value>(mlir::OperandRange const&, llvm::raw_ostream&, mlir::OpAsmPrinter::printFunctionalType(mlir::Operation*)::$_0, llvm::StringRef const&)::'lambda'(), void>(mlir::OperandRange, mlir::OperandRange, mlir::OpAsmPrinter::printFunctionalType(mlir::Operation*)::$_0, llvm::raw_ostream) iree/third_party/llvm-project/llvm/include/llvm/ADT/STLExtras.h:2181:16
#17 0x00007f61148dc1e1 void llvm::interleave<mlir::OperandRange, mlir::OpAsmPrinter::printFunctionalType(mlir::Operation*)::$_0, llvm::raw_ostream, mlir::Value>(mlir::OperandRange const&, llvm::raw_ostream&, mlir::OpAsmPrinter::printFunctionalType(mlir::Operation*)::$_0, llvm::StringRef const&) iree/third_party/llvm-project/llvm/include/llvm/ADT/STLExtras.h:2201:3
#18 0x00007f61148dc1e1 void llvm::interleaveComma<mlir::OperandRange, mlir::OpAsmPrinter::printFunctionalType(mlir::Operation*)::$_0, llvm::raw_ostream, mlir::Value>(mlir::OperandRange const&, llvm::raw_ostream&, mlir::OpAsmPrinter::printFunctionalType(mlir::Operation*)::$_0) iree/third_party/llvm-project/llvm/include/llvm/ADT/STLExtras.h:2215:3
#19 0x00007f61148dc1e1 mlir::OpAsmPrinter::printFunctionalType(mlir::Operation*) iree/third_party/llvm-project/mlir/lib/IR/AsmPrinter.cpp:84:3
#20 0x00007f61148e808b (anonymous namespace)::OperationPrinter::printGenericOp(mlir::Operation*, bool) iree/third_party/llvm-project/mlir/lib/IR/AsmPrinter.cpp:3374:1
#21 0x00007f61148e7a7e (anonymous namespace)::OperationPrinter::printCustomOrGenericOp(mlir::Operation*) iree/third_party/llvm-project/mlir/lib/IR/AsmPrinter.cpp:3341:1
#22 0x00007f61148e53eb mlir::OpPrintingFlags::shouldPrintDebugInfo() const iree/third_party/llvm-project/mlir/lib/IR/AsmPrinter.cpp:267:10
#23 0x00007f61148e53eb mlir::AsmPrinter::Impl::printTrailingLocation(mlir::Location, bool) iree/third_party/llvm-project/mlir/lib/IR/AsmPrinter.cpp:1846:21
#24 0x00007f61148e53eb (anonymous namespace)::OperationPrinter::printFullOpWithIndentAndLoc(mlir::Operation*) iree/third_party/llvm-project/mlir/lib/IR/AsmPrinter.cpp:3203:3
#25 0x00007f61148e3bc1 (anonymous namespace)::OperationPrinter::~OperationPrinter() iree/third_party/llvm-project/mlir/lib/IR/AsmPrinter.cpp:2901:7
#26 0x00007f61148e3bc1 mlir::Operation::print(llvm::raw_ostream&, mlir::AsmState&) iree/third_party/llvm-project/mlir/lib/IR/AsmPrinter.cpp:3694:1
#27 0x00007f61148e3916 mlir::Operation::print(llvm::raw_ostream&, mlir::OpPrintingFlags const&) iree/third_party/llvm-project/mlir/lib/IR/AsmPrinter.cpp:3685:1
#28 0x00007f611493ab86 mlir::Diagnostic::appendOp(mlir::Operation&, mlir::OpPrintingFlags const&) iree/third_party/llvm-project/mlir/lib/IR/Diagnostics.cpp:0:6
#29 0x00007f611498394a mlir::Operation::emitError(llvm::Twine const&) iree/third_party/llvm-project/mlir/lib/IR/Operation.cpp:242:1
#30 0x00007f61149850cc mlir::OpState::emitError(llvm::Twine const&) iree/third_party/llvm-project/mlir/lib/IR/Operation.cpp:642:3
#31 0x00007f6114502b59 mlir::InFlightDiagnostic::isInFlight() const iree/third_party/llvm-project/mlir/include/mlir/IR/Diagnostics.h:388:36
#32 0x00007f6114502b59 mlir::InFlightDiagnostic& mlir::InFlightDiagnostic::append<char const (&) [24]>(char const (&) [24]) & iree/third_party/llvm-project/mlir/include/mlir/IR/Diagnostics.h:336:9
#33 0x00007f6114502b59 mlir::InFlightDiagnostic&& mlir::InFlightDiagnostic::operator<<<char const (&) [24]>(char const (&) [24]) && iree/third_party/llvm-project/mlir/include/mlir/IR/Diagnostics.h:329:22
#34 0x00007f6114502b59 mlir::func::ReturnOp::verify() iree/third_party/llvm-project/mlir/lib/Dialect/Func/IR/FuncOps.cpp:358:26
#35 0x00007f611450caa0 mlir::LogicalResult::failed() const iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#36 0x00007f611450caa0 mlir::failed(mlir::LogicalResult) iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#37 0x00007f611450caa0 mlir::Op<mlir::func::ReturnOp, mlir::OpTrait::ZeroRegions, mlir::OpTrait::ZeroResults, mlir::OpTrait::ZeroSuccessors, mlir::OpTrait::VariadicOperands, mlir::OpTrait::HasParent<mlir::func::FuncOp>::Impl, mlir::OpTrait::OpInvariants, mlir::ConditionallySpeculatable::Trait, mlir::OpTrait::AlwaysSpeculatableImplTrait, mlir::MemoryEffectOpInterface::Trait, mlir::OpTrait::MemRefsNormalizable, mlir::OpTrait::ReturnLike, mlir::OpTrait::IsTerminator>::verifyInvariants(mlir::Operation*) iree/third_party/llvm-project/mlir/include/mlir/IR/OpDefinition.h:1872:9
#38 0x00007f611450c66f llvm::unique_function<mlir::LogicalResult (mlir::Operation*) const>::operator()(mlir::Operation*) const iree/third_party/llvm-project/llvm/include/llvm/ADT/FunctionExtras.h:408:12
#39 0x00007f611450c66f mlir::RegisteredOperationName::Model<mlir::func::ReturnOp>::verifyInvariants(mlir::Operation*) iree/third_party/llvm-project/mlir/include/mlir/IR/OperationSupport.h:411:14
#40 0x00007f611499b489 mlir::LogicalResult::failed() const iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#41 0x00007f611499b489 mlir::failed(mlir::LogicalResult) iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#42 0x00007f611499b489 (anonymous namespace)::OperationVerifier::verifyOperation(mlir::Operation&) iree/third_party/llvm-project/mlir/lib/IR/Verifier.cpp:188:25
#43 0x00007f611499b70b mlir::LogicalResult::failed() const iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#44 0x00007f611499b70b mlir::failed(mlir::LogicalResult) iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#45 0x00007f611499b70b (anonymous namespace)::OperationVerifier::verifyBlock(mlir::Block&, llvm::SmallVectorImpl<mlir::Operation*>&) iree/third_party/llvm-project/mlir/lib/IR/Verifier.cpp:143:16
#46 0x00007f611499b70b (anonymous namespace)::OperationVerifier::verifyOperation(mlir::Operation&) iree/third_party/llvm-project/mlir/lib/IR/Verifier.cpp:227:22
#47 0x00007f611499b35c mlir::LogicalResult::failed() const iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#48 0x00007f611499b35c mlir::failed(mlir::LogicalResult) iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#49 0x00007f611499b35c (anonymous namespace)::OperationVerifier::verifyOpAndDominance(mlir::Operation&) iree/third_party/llvm-project/mlir/lib/IR/Verifier.cpp:78:7
#50 0x00007f611499b35c mlir::verify(mlir::Operation*, bool) iree/third_party/llvm-project/mlir/lib/IR/Verifier.cpp:376:19
#51 0x00007f6114831c4c mlir::LogicalResult::failed() const iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#52 0x00007f6114831c4c mlir::failed(mlir::LogicalResult) iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#53 0x00007f6114831c4c mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:511:20
#54 0x00007f6114831ff8 mlir::LogicalResult::failed() const iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#55 0x00007f6114831ff8 mlir::failed(mlir::LogicalResult) iree/third_party/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#56 0x00007f6114831ff8 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:548:9
#57 0x00007f611483644e mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15::operator()(mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo&) const iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:773:5
#58 0x00007f611483644e mlir::LogicalResult mlir::failableParallelForEach<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&>(mlir::MLIRContext*, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&)::'lambda'()::operator()() const iree/third_party/llvm-project/mlir/include/mlir/IR/Threading.h:62:18
#59 0x00007f611483644e __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > > std::__invoke_impl<void, mlir::LogicalResult mlir::failableParallelForEach<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&>(mlir::MLIRContext*, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&)::'lambda'()&>(std::__invoke_other, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:61:14
#60 0x00007f611483644e std::enable_if<is_invocable_r_v<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&>, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > > >::type std::__invoke_r<void, mlir::LogicalResult mlir::failableParallelForEach<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&>(mlir::MLIRContext*, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&)::'lambda'()&>(mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:111:2
#61 0x00007f611483644e std::_Function_handler<void (), mlir::LogicalResult mlir::failableParallelForEach<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&>(mlir::MLIRContext*, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_15&)::'lambda'()>::_M_invoke(std::_Any_data const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_function.h:290:9
#62 0x00007f611443c5b6 std::__shared_ptr<std::promise<void>, (__gnu_cxx::_Lock_policy)2>::get() const /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:1666:16
#63 0x00007f611443c5b6 std::__shared_ptr_access<std::promise<void>, (__gnu_cxx::_Lock_policy)2, false, false>::_M_get() const /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:1363:66
#64 0x00007f611443c5b6 std::__shared_ptr_access<std::promise<void>, (__gnu_cxx::_Lock_policy)2, false, false>::operator->() const /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:1357:9
#65 0x00007f611443c5b6 llvm::ThreadPool::createTaskAndFuture(std::function<void ()>)::'lambda'()::operator()() const iree/third_party/llvm-project/llvm/include/llvm/Support/ThreadPool.h:136:15
#66 0x00007f611443c5b6 void std::__invoke_impl<void, llvm::ThreadPool::createTaskAndFuture(std::function<void ()>)::'lambda'()&>(std::__invoke_other, llvm::ThreadPool::createTaskAndFuture(std::function<void ()>)::'lambda'()&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:61:14
#67 0x00007f611443c5b6 std::enable_if<is_invocable_r_v<void, llvm::ThreadPool::createTaskAndFuture(std::function<void ()>)::'lambda'()&>, void>::type std::__invoke_r<void, llvm::ThreadPool::createTaskAndFuture(std::function<void ()>)::'lambda'()&>(llvm::ThreadPool::createTaskAndFuture(std::function<void ()>)::'lambda'()&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:111:2
#68 0x00007f611443c5b6 std::_Function_handler<void (), llvm::ThreadPool::createTaskAndFuture(std::function<void ()>)::'lambda'()>::_M_invoke(std::_Any_data const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_function.h:290:9
#69 0x00007f6116d504f7 __gthread_mutex_lock(pthread_mutex_t*) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12/bits/gthr-default.h:749:12
#70 0x00007f6116d504f7 std::mutex::lock() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_mutex.h:100:17
#71 0x00007f6116d504f7 std::lock_guard<std::mutex>::lock_guard(std::mutex&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_mutex.h:229:19
#72 0x00007f6116d504f7 llvm::ThreadPool::processTasks(llvm::ThreadPoolTaskGroup*) iree/third_party/llvm-project/llvm/lib/Support/ThreadPool.cpp:115:35
#73 0x00007f6116d50f6c std::default_delete<std::tuple<llvm::ThreadPool::grow(int)::$_0> >::operator()(std::tuple<llvm::ThreadPool::grow(int)::$_0>*) const /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/unique_ptr.h:95:2
#74 0x00007f6116d50f6c std::unique_ptr<std::tuple<llvm::ThreadPool::grow(int)::$_0>, std::default_delete<std::tuple<llvm::ThreadPool::grow(int)::$_0> > >::~unique_ptr() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/unique_ptr.h:396:4
#75 0x00007f6116d50f6c void llvm::thread::GenericThreadProxy<std::tuple<llvm::ThreadPool::grow(int)::$_0> >(void*) iree/third_party/llvm-project/llvm/include/llvm/Support/thread.h:46:3
#76 0x00007f6116d50f6c void* llvm::thread::ThreadProxy<std::tuple<llvm::ThreadPool::grow(int)::$_0> >(void*) iree/third_party/llvm-project/llvm/include/llvm/Support/thread.h:55:5
#77 0x00007f610f8a7fd4 start_thread ./nptl/./nptl/pthread_create.c:442:8
#78 0x00007f610f92866c clone3 ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:83:0
[1]    1134470 segmentation fault  iree-compile --iree-hal-target-backends=llvmcpu --iree-input-type=mhlo  -o 

Steps to reproduce your issue

Download jax_ir41_pmap__wrapped_step_fn.mlir and run iree-compile (tested to fail on llvmcpu and cuda):

iree-compile \                                                            
    --iree-hal-target-backends=llvm-cpu \
    --iree-input-type=mhlo \
    /tmp/jax_ir41_pmap__wrapped_step_fn.mlir \
    -o /tmp/jax_ir41_pmap__wrapped_step_fn.vmfb

(It's being pmap'd to a single device so that shouldn't be relevant).

What component(s) does this issue relate to?

Compiler

Version information

7a0208a3a8385543a37e939c09db45bab151d83b

Additional context

No response

phoenix-meadowlark commented 1 year ago

It's maybe worth noting that this segfault happens before iree-compile could let me know that llvmcpu isn't a registered target backend (but llvm-cpu is). If that validation happens before compilation then that might narrow things down.

rsuderman commented 1 year ago

This failure appears to occur before iree-stream-schedule-execution

We are seeing the same behavior in https://github.com/openxla/iree/issues/13459 so likely this is a dupe. We should swing back around when it is closed.

allieculp commented 1 year ago

@rsuderman #13459 is closed - was this a duplicate? @silvasean for visibility

rsuderman commented 1 year ago

Verified that this compiles successfully now.