nod-ai / SHARK-ModelDev

Unified compiler/runtime for interfacing with PyTorch Dynamo.
Apache License 2.0
95 stars 48 forks source link

Assertion `succeeded( ConcreteT::verifyInvariants(getDefaultDiagnosticEmitFn(ctx), args...))' failed #883

Closed pdhirajkumarprasad closed 2 weeks ago

pdhirajkumarprasad commented 2 weeks ago

For attached IR, we are seeing assertion with give stack trace:

<unknown>:0: error: invalid tensor dimension size
iree-compile: /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/include/mlir/IR/StorageUniquerSupport.h:180: static ConcreteT mlir::detail::StorageUserBase<mlir::RankedTensorType, mlir::TensorType, mlir::detail::RankedTensorTypeStorage, mlir::detail::TypeUniquer, mlir::ShapedType::Trait, mlir::ValueSemantics>::get(mlir::MLIRContext *, Args &&...) [ConcreteT = mlir::RankedTensorType, BaseT = mlir::TensorType, StorageT = mlir::detail::RankedTensorTypeStorage, UniquerT = mlir::detail::TypeUniquer, Traits = <mlir::ShapedType::Trait, mlir::ValueSemantics>, Args = <llvm::ArrayRef<long> &, mlir::Type &, mlir::Attribute &>]: Assertion `succeeded( ConcreteT::verifyInvariants(getDefaultDiagnosticEmitFn(ctx), args...))' failed.
Please report issues to https://github.com/iree-org/iree/issues and include the crash backtrace.
Stack dump:
0.  Program arguments: iree-compile --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-target-cpu=host model.torch_onnx.mlir -o abc.vmfb
 #0 0x00007ff6b16a81f7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:13
 #1 0x00007ff6b16a6430 llvm::sys::RunSignalHandlers() /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/llvm/lib/Support/Signals.cpp:106:18
 #2 0x00007ff6b16a88ba SignalHandler(int) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:413:1
 #3 0x00007ff6ab8b2520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007ff6ab9069fc __pthread_kill_implementation ./nptl/./nptl/pthread_kill.c:44:76
 #5 0x00007ff6ab9069fc __pthread_kill_internal ./nptl/./nptl/pthread_kill.c:78:10
 #6 0x00007ff6ab9069fc pthread_kill ./nptl/./nptl/pthread_kill.c:89:10
 #7 0x00007ff6ab8b2476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x00007ff6ab8987f3 abort ./stdlib/./stdlib/abort.c:81:7
 #9 0x00007ff6ab89871b _nl_load_domain ./intl/./intl/loadmsgcat.c:1177:9
#10 0x00007ff6ab8a9e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
#11 0x00007ff6b174fc64 (/proj/xhdhdstaff6/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x5a8bc64)
#12 0x00007ff6b174fb88 mlir::RankedTensorType::get(llvm::ArrayRef<long>, mlir::Type, mlir::Attribute) /proj/xhdhdstaff6/dhirajp/localBuild/iree-build/llvm-project/tools/mlir/include/mlir/IR/BuiltinTypes.cpp.inc:312:3
#13 0x00007ff6b5d86456 mlir::tensor::EmptyOp::build(mlir::OpBuilder&, mlir::OperationState&, llvm::ArrayRef<long>, mlir::Type, mlir::ValueRange, mlir::Attribute) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp:0:21
#14 0x00007ff6b5d86456 mlir::tensor::EmptyOp::build(mlir::OpBuilder&, mlir::OperationState&, llvm::ArrayRef<mlir::OpFoldResult>, mlir::Type, mlir::Attribute) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp:894:3
#15 0x00007ff6b1dee50f mlir::tensor::EmptyOp mlir::OpBuilder::create<mlir::tensor::EmptyOp, llvm::SmallVector<mlir::OpFoldResult, 6u>&, mlir::Type>(mlir::Location, llvm::SmallVector<mlir::OpFoldResult, 6u>&, mlir::Type&&) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/include/mlir/IR/Builders.h:518:16
#16 0x00007ff6b3eb4092 mlir::iree_compiler::lowerOpWithEncoding(mlir::RewriterBase&, mlir::tensor::EmptyOp, mlir::ValueRange, mlir::iree_compiler::MaterializeEncodingTypeConverter const&, std::function<llvm::FailureOr<mlir::iree_compiler::MaterializeEncodingValueInfo> (mlir::RankedTensorType, mlir::OpBuilder&, mlir::Location)>) /proj/xhdhdstaff6/dhirajp/localBuild/iree/compiler/src/iree/compiler/Codegen/Common/MaterializeEncodingIntoPackUnPack.cpp:402:36
#17 0x00007ff6b3eb4092 mlir::iree_compiler::(anonymous namespace)::MaterializeOperation<mlir::tensor::EmptyOp>::matchAndRewrite(mlir::tensor::EmptyOp, mlir::tensor::EmptyOpAdaptor, mlir::ConversionPatternRewriter&) const /proj/xhdhdstaff6/dhirajp/localBuild/iree/compiler/src/iree/compiler/Codegen/Common/MaterializeEncodingIntoPackUnPack.cpp:841:9
#18 0x00007ff6b1d51f5e mlir::OpConversionPattern<mlir::tensor::EmptyOp>::matchAndRewrite(mlir::Operation*, llvm::ArrayRef<mlir::Value>, mlir::ConversionPatternRewriter&) const /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/include/mlir/Transforms/DialectConversion.h:615:3
#19 0x00007ff6b5bd30c2 mlir::ConversionPattern::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&) const /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:1682:10
#20 0x00007ff6b5c134ee mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<llvm::LogicalResult (mlir::Pattern const&)>)::$_2::operator()() const /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Rewrite/PatternApplicator.cpp:212:13
#21 0x00007ff6b5c134ee void llvm::function_ref<void ()>::callback_fn<mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<llvm::LogicalResult (mlir::Pattern const&)>)::$_2>(long) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46:12
#22 0x00007ff6b5c1052f mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<llvm::LogicalResult (mlir::Pattern const&)>) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Rewrite/PatternApplicator.cpp:233:9
#23 0x00007ff6b5bd3fa9 (anonymous namespace)::OperationLegalizer::legalize(mlir::Operation*, mlir::ConversionPatternRewriter&) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:0:0
#24 0x00007ff6b5bd3127 mlir::OperationConverter::convert(mlir::ConversionPatternRewriter&, mlir::Operation*) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:0:0
#25 0x00007ff6b5bd41af llvm::LogicalResult::failed() const /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/llvm/include/llvm/Support/LogicalResult.h:43:43
#26 0x00007ff6b5bd41af llvm::failed(llvm::LogicalResult) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/llvm/include/llvm/Support/LogicalResult.h:71:58
#27 0x00007ff6b5bd41af mlir::OperationConverter::convertOperations(llvm::ArrayRef<mlir::Operation*>) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2495:9
#28 0x00007ff6b5bda5eb mlir::applyPartialConversion(llvm::ArrayRef<mlir::Operation*>, mlir::ConversionTarget const&, mlir::FrozenRewritePatternSet const&, mlir::ConversionConfig) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:3258:22
#29 0x00007ff6b5bda5eb mlir::applyPartialConversion(mlir::Operation*, mlir::ConversionTarget const&, mlir::FrozenRewritePatternSet const&, mlir::ConversionConfig) /proj/xhdhdstaff6/dhirajp/localBuild/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:3264:10
#30 0x00007ff6b3e2ce08 mlir::iree_compiler::materializeFuncOpEncodings(mlir::FunctionOpInterface, mlir::iree_compiler::IREE::HAL::ExecutableTargetAttr) /proj/xhdhdstaff6/dhirajp/localBuild/iree/compiler/src/iree/compiler/Codegen/Common/CPU/CPUMaterializeEncodings.cpp:491:14
#31 0x00007ff6b3e2f651 mlir::iree_compiler::CPUMaterializeHostEncodingPass::runOnOperation() /proj/xhdhdstaff6/dhirajp/localBuild/iree/compiler/src/iree/compiler/Codegen/Common/CPU/CPUMaterializeEncodings.cpp:0:15

command:

iree-compile --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-target-cpu=host model.torch_onnx.mlir -o abc.vmfb

model.torch_onnx.mlir.txt

pdhirajkumarprasad commented 2 weeks ago

Add failing model list in basic_opt list resulting in all but 1 passing compilation and inference while 1 model i.e maxvit_xlarge_tf_512.in21k_ft_in1k is failing with

[libprotobuf ERROR /build/Release/_deps/protobuf-src/src/google/protobuf/message_lite.cc:402] onnx.ModelProto exceeded maximum protobuf size of 2GB: 2169605934
pdhirajkumarprasad commented 2 weeks ago

After adding model to opt_list, this crash is no more there so closing this