iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.56k stars 574 forks source link

[compile][#812]: crash during iree-auto-input-conversion #18385

Open pdhirajkumarprasad opened 3 weeks ago

pdhirajkumarprasad commented 3 weeks ago

What happened?

for given IR

module {
  func.func @torch_jit(%arg0: !torch.vtensor<[8],si64>, %arg1: !torch.vtensor<[],f32>) -> !torch.vtensor<[],f32> attributes {torch.onnx_meta.ir_version = 7 : si64, torch.onnx_meta.opset_version = 13 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "1.12.1"} {
    %356:8 = torch.operator "onnx.Split"(%arg1, %arg0) {torch.onnx.axis = 1 : si64} : (!torch.vtensor<[],f32>, !torch.vtensor<[8],si64>) -> (!torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>) 
    %357 = torch.operator "onnx.Concat"(%356#0, %356#1, %356#2, %356#3, %356#4, %356#5, %356#6, %356#7) {torch.onnx.axis = 3 : si64} : (!torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>) -> !torch.vtensor<[],f32> 
    return %357: !torch.vtensor<[],f32>
  }
}

seeing crash:

Please report issues to https://github.com/iree-org/iree/issues and include the crash backtrace.
Stack dump:
0.  Program arguments: iree-compile --iree-input-demote-i64-to-i32 --iree-hal-target-backends=rocm test.mlir --mlir-print-ir-after-all --mlir-print-ir-before-all --mlir-disable-threading --mlir-elide-elementsattrs-if-larger=4
 #0 0x00007fa2ac11bc07 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/proj/rdi/staff/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x1637c07)
 #1 0x00007fa2ac1199de llvm::sys::RunSignalHandlers() (/proj/rdi/staff/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x16359de)
 #2 0x00007fa2ac11c2da SignalHandler(int) Signals.cpp:0:0
 #3 0x00007fa2aa6d2520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007fa2acdefafe mlir::torch::onnx_c::populateDefaultDomainQtoZ(mlir::torch::onnx_c::OnnxCustomOpConversionPattern&)::$_35::__invoke(mlir::torch::onnx_c::OpBinder, mlir::ConversionPatternRewriter&) DefaultDomainQtoZ.cpp:0:0
 #5 0x00007fa2ace13ad1 mlir::torch::onnx_c::OnnxCustomOpConversionPattern::matchAndRewrite(mlir::torch::Torch::OperatorOp, mlir::torch::Torch::OperatorOpAdaptor, mlir::ConversionPatternRewriter&) const (/proj/rdi/staff/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x232fad1)
 #6 0x00007fa2acd6cb45 mlir::OpConversionPattern<mlir::torch::Torch::OperatorOp>::matchAndRewrite(mlir::Operation*, llvm::ArrayRef<mlir::Value>, mlir::ConversionPatternRewriter&) const (/proj/rdi/staff/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x2288b45)
 #7 0x00007fa2affc36a2 mlir::ConversionPattern::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&) const (/proj/rdi/staff/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x54df6a2)
 #8 0x00007fa2afffeaad void llvm::function_ref<void ()>::callback_fn<mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<llvm::LogicalResult (mlir::Pattern const&)>)::$_2>(long) PatternApplicator.cpp:0:0
 #9 0x00007fa2afffbd74 mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<llvm::LogicalResult (mlir::Pattern const&)>) (/proj/rdi/staff/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x5517d74)
#10 0x00007fa2affc411a (anonymous namespace)::OperationLegalizer::legalize(mlir::Operation*, mlir::ConversionPatternRewriter&) DialectConversion.cpp:0:0
#11 0x00007fa2affc3712 mlir::OperationConverter::convert(mlir::ConversionPatternRewriter&, mlir::Operation*) (/proj/rdi/staff/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x54df712)
#12 0x00007fa2affc423f mlir::OperationConverter::convertOperations(llvm::ArrayRef<mlir::Operation*>) (/proj/rdi/staff/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x54e023f)
#13 0x00007fa2affccf1b mlir::applyPartialConversion(mlir::Operation*, mlir::ConversionTarget const&, mlir::FrozenRewritePatternSet const&, mlir::ConversionConfig) (/proj/rdi/staff/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x54e8f1b)

Steps to reproduce your issue

command to reproduce:

iree-compile --iree-input-demote-i64-to-i32 --iree-hal-target-backends=rocm test.mlir

log with '--mlir-print-ir-after-all --mlir-print-ir-before-all --mlir-disable-threading --mlir-elide-elementsattrs-if-larger=4'

// -----// IR Dump Before AutoInputConversionPipelinePass (iree-auto-input-conversion) //----- //
module {
  func.func @torch_jit(%arg0: !torch.vtensor<[8],si64>, %arg1: !torch.vtensor<[],f32>) -> !torch.vtensor<[],f32> attributes {torch.onnx_meta.ir_version = 7 : si64, torch.onnx_meta.opset_version = 13 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "1.12.1"} {
    %0:8 = torch.operator "onnx.Split"(%arg1, %arg0) {torch.onnx.axis = 1 : si64} : (!torch.vtensor<[],f32>, !torch.vtensor<[8],si64>) -> (!torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>) 
    %1 = torch.operator "onnx.Concat"(%0#0, %0#1, %0#2, %0#3, %0#4, %0#5, %0#6, %0#7) {torch.onnx.axis = 3 : si64} : (!torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>) -> !torch.vtensor<[],f32> 
    return %1 : !torch.vtensor<[],f32>
  }
}

// -----// IR Dump Before ConvertTorchOnnxToTorch (convert-torch-onnx-to-torch) //----- //
func.func @torch_jit(%arg0: !torch.vtensor<[8],si64>, %arg1: !torch.vtensor<[],f32>) -> !torch.vtensor<[],f32> attributes {torch.onnx_meta.ir_version = 7 : si64, torch.onnx_meta.opset_version = 13 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "1.12.1"} {
  %0:8 = torch.operator "onnx.Split"(%arg1, %arg0) {torch.onnx.axis = 1 : si64} : (!torch.vtensor<[],f32>, !torch.vtensor<[8],si64>) -> (!torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>) 
  %1 = torch.operator "onnx.Concat"(%0#0, %0#1, %0#2, %0#3, %0#4, %0#5, %0#6, %0#7) {torch.onnx.axis = 3 : si64} : (!torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>) -> !torch.vtensor<[],f32> 
  return %1 : !torch.vtensor<[],f32>
}

Please report issues to https://github.com/iree-org/iree/issues and include the crash backtrace.
Stack dump:
0.  Program arguments: iree-compile --iree-input-demote-i64-to-i32 --iree-hal-target-backends=rocm test.mlir --mlir-print-ir-after-all --mlir-print-ir-before-all --mlir-disable-threading --mlir-elide-elementsattrs-if-larger=4
 #0 0x00007fa2ac11bc07 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/proj/rdi/staff/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x1637c07)
 #1 0x00007fa2ac1199de llvm::sys::RunSignalHandlers() (/proj/rdi/staff/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x16359de)
 #2 0x00007fa2ac11c2da SignalHandler(int) Signals.cpp:0:0
 #3 0x00007fa2aa6d2520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007fa2acdefafe mlir::torch::onnx_c::populateDefaultDomainQtoZ(mlir::torch::onnx_c::OnnxCustomOpConversionPattern&)::$_35::__invoke(mlir::torch::onnx_c::OpBinder, mlir::ConversionPatternRewriter&) DefaultDomainQtoZ.cpp:0:0
 #5 0x00007fa2ace13ad1 mlir::torch::onnx_c::OnnxCustomOpConversionPattern::matchAndRewrite(mlir::torch::Torch::OperatorOp, mlir::torch::Torch::OperatorOpAdaptor, mlir::ConversionPatternRewriter&) const (/proj/rdi/staff/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x232fad1)
 #6 0x00007fa2acd6cb45 mlir::OpConversionPattern<mlir::torch::Torch::OperatorOp>::matchAndRewrite(mlir::Operation*, llvm::ArrayRef<mlir::Value>, mlir::ConversionPatternRewriter&) const (/proj/rdi/staff/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x2288b45)
 #7 0x00007fa2affc36a2 mlir::ConversionPattern::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&) const (/proj/rdi/staff/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x54df6a2)
 #8 0x00007fa2afffeaad void llvm::function_ref<void ()>::callback_fn<mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<llvm::LogicalResult (mlir::Pattern const&)>)::$_2>(long) PatternApplicator.cpp:0:0
 #9 0x00007fa2afffbd74 mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<llvm::LogicalResult (mlir::Pattern const&)>) (/proj/rdi/staff/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x5517d74)
#10 0x00007fa2affc411a (anonymous namespace)::OperationLegalizer::legalize(mlir::Operation*, mlir::ConversionPatternRewriter&) DialectConversion.cpp:0:0
#11 0x00007fa2affc3712 mlir::OperationConverter::convert(mlir::ConversionPatternRewriter&, mlir::Operation*) (/proj/rdi/staff/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x54df712)
#12 0x00007fa2affc423f mlir::OperationConverter::convertOperations(llvm::ArrayRef<mlir::Operation*>) (/proj/rdi/staff/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x54e023f)
#13 0x00007fa2affccf1b mlir::applyPartialConversion(mlir::Operation*, mlir::ConversionTarget const&, mlir::FrozenRewritePatternSet const&, mlir::ConversionConfig) (/proj/rdi/staff/dhirajp/localBuild/iree-build/lib/libIREECompiler.so+0x54e8f1b)

What component(s) does this issue relate to?

Compiler

Version information

No response

Additional context

No response

ScottTodd commented 3 weeks ago

The included stack trace is missing line and column numbers and some symbols.

I can repro at https://github.com/iree-org/iree/commit/756668022bdc35107ce01d2dfbbfc20f5d0faf73 with iree-compile --iree-hal-target-backends=llvm-cpu issue_18385.mlir -o /tmp/issue_18385.vmfb

Hitting an assert here:

// -----// IR Dump Before ConvertTorchOnnxToTorch (convert-torch-onnx-to-torch) //----- //
func.func @torch_jit(%arg0: !torch.vtensor<[8],si64>, %arg1: !torch.vtensor<[],f32>) -> !torch.vtensor<[],f32> attributes {torch.onnx_meta.ir_version = 7 : si64, torch.onnx_meta.opset_version = 13 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "1.12.1"} {
  %0:8 = torch.operator "onnx.Split"(%arg1, %arg0) {torch.onnx.axis = 1 : si64} : (!torch.vtensor<[],f32>, !torch.vtensor<[8],si64>) -> (!torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>)
  %1 = torch.operator "onnx.Concat"(%0#0, %0#1, %0#2, %0#3, %0#4, %0#5, %0#6, %0#7) {torch.onnx.axis = 3 : si64} : (!torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>) -> !torch.vtensor<[],f32>
  return %1 : !torch.vtensor<[],f32>
}

Assertion failed: Index < Length && "Invalid index!", file D:\dev\projects\iree\third_party\llvm-project\llvm\include\llvm/ADT/ArrayRef.h, line 257

Full callstack:

iree-compile.exe!HandleAbort(int Sig) Line 429 (d:\dev\projects\iree\third_party\llvm-project\llvm\lib\Support\Windows\Signals.inc:429)
ucrtbase.dll!00007ffce2bd1881() (Unknown Source:0)
ucrtbase.dll!00007ffce2bd2851() (Unknown Source:0)
ucrtbase.dll!00007ffce2bd426e() (Unknown Source:0)
ucrtbase.dll!00007ffce2bd4165() (Unknown Source:0)
ucrtbase.dll!00007ffce2bd44f1() (Unknown Source:0)
[Inline Frame] iree-compile.exe!llvm::ArrayRef<__int64>::operator[](unsigned __int64) Line 257 (d:\dev\projects\iree\third_party\llvm-project\llvm\include\llvm\ADT\ArrayRef.h:257)
iree-compile.exe!mlir::torch::onnx_c::populateDefaultDomainQtoZ::__l2::<lambda>(mlir::torch::onnx_c::OpBinder binder, mlir::ConversionPatternRewriter & rewriter) Line 1831 (d:\dev\projects\iree\third_party\torch-mlir\lib\Conversion\TorchOnnxToTorch\DefaultDomainQtoZ.cpp:1831)
iree-compile.exe!llvm::LogicalResult <lambda>(mlir::torch::onnx_c::OpBinder, mlir::ConversionPatternRewriter &)::<lambda_invoker_cdecl>(mlir::torch::onnx_c::OpBinder binder, mlir::ConversionPatternRewriter & rewriter) Line 1860 (d:\dev\projects\iree\third_party\torch-mlir\lib\Conversion\TorchOnnxToTorch\DefaultDomainQtoZ.cpp:1860)
iree-compile.exe!mlir::torch::onnx_c::OnnxCustomOpConversionPattern::matchAndRewrite(mlir::torch::Torch::OperatorOp op, mlir::torch::Torch::OperatorOpAdaptor adaptor, mlir::ConversionPatternRewriter & rewriter) Line 35 (d:\dev\projects\iree\third_party\torch-mlir\lib\Conversion\TorchOnnxToTorch\Patterns.cpp:35)
iree-compile.exe!mlir::OpConversionPattern<mlir::torch::Torch::OperatorOp>::matchAndRewrite(mlir::Operation * op, llvm::ArrayRef<mlir::Value> operands, mlir::ConversionPatternRewriter & rewriter) Line 544 (d:\dev\projects\iree\third_party\llvm-project\mlir\include\mlir\Transforms\DialectConversion.h:544)
iree-compile.exe!mlir::ConversionPattern::matchAndRewrite(mlir::Operation * op, mlir::PatternRewriter & rewriter) Line 1651 (d:\dev\projects\iree\third_party\llvm-project\mlir\lib\Transforms\Utils\DialectConversion.cpp:1651)
iree-compile.exe!mlir::PatternApplicator::matchAndRewrite::__l8::<lambda>() Line 212 (d:\dev\projects\iree\third_party\llvm-project\mlir\lib\Rewrite\PatternApplicator.cpp:212)
[Inline Frame] iree-compile.exe!llvm::function_ref<void __cdecl(void)>::operator()() Line 68 (d:\dev\projects\iree\third_party\llvm-project\llvm\include\llvm\ADT\STLFunctionalExtras.h:68)
[Inline Frame] iree-compile.exe!mlir::MLIRContext::executeAction(llvm::function_ref<void __cdecl(void)>) Line 275 (d:\dev\projects\iree\third_party\llvm-project\mlir\include\mlir\IR\MLIRContext.h:275)
iree-compile.exe!mlir::PatternApplicator::matchAndRewrite(mlir::Operation * op, mlir::PatternRewriter & rewriter, llvm::function_ref<bool __cdecl(mlir::Pattern const &)> canApply, llvm::function_ref<void __cdecl(mlir::Pattern const &)> onFailure, llvm::function_ref<llvm::LogicalResult __cdecl(mlir::Pattern const &)> onSuccess) Line 233 (d:\dev\projects\iree\third_party\llvm-project\mlir\lib\Rewrite\PatternApplicator.cpp:233)
iree-compile.exe!`anonymous namespace'::OperationLegalizer::legalizeWithPattern(mlir::Operation * op, mlir::ConversionPatternRewriter & rewriter) Line 1960 (d:\dev\projects\iree\third_party\llvm-project\mlir\lib\Transforms\Utils\DialectConversion.cpp:1960)
iree-compile.exe!`anonymous namespace'::OperationLegalizer::legalize(mlir::Operation * op, mlir::ConversionPatternRewriter & rewriter) Line 1850 (d:\dev\projects\iree\third_party\llvm-project\mlir\lib\Transforms\Utils\DialectConversion.cpp:1850)
iree-compile.exe!mlir::OperationConverter::convert(mlir::ConversionPatternRewriter & rewriter, mlir::Operation * op) Line 2382 (d:\dev\projects\iree\third_party\llvm-project\mlir\lib\Transforms\Utils\DialectConversion.cpp:2382)
iree-compile.exe!mlir::OperationConverter::convertOperations(llvm::ArrayRef<mlir::Operation *> ops) Line 2434 (d:\dev\projects\iree\third_party\llvm-project\mlir\lib\Transforms\Utils\DialectConversion.cpp:2434)
benvanik commented 3 weeks ago

%1 = torch.operator "onnx.Concat"(%0#0, %0#1, %0#2, %0#3, %0#4, %0#5, %0#6, %0#7) {torch.onnx.axis = 3 : si64} : (!torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],f32>) -> !torch.vtensor<[],f32>

axis 3 of a bunch of scalar tensors feels wrong? concatentating all those should not result in a !torch.vtensor<[],f32> - so probably some shape inference thing or bad input

ScottTodd commented 3 weeks ago

Most of the split unit tests are passing 🤔. These two are failing at runtime, while this issue is a compiler crash:

    "onnx/node/generated/test_split_zero_size_splits_opset13",
    "onnx/node/generated/test_split_zero_size_splits_opset18",

I presume the input IR here is valid?

ScottTodd commented 3 weeks ago

Do you have the original .onnx file this input was generated from? Or was this reduced from a larger program? Shape inferencing code runs during import from .onnx to .mlir, if we suspect an issue in there.

pdhirajkumarprasad commented 3 weeks ago

This IR is reduced from a large program. The actual onnx model got generated by running e2e shark test for following models

onnx/models/bat_resnext26ts.ch_in1k 
onnx/models/levit_conv_128.fb_dist_in1k 
onnx/models/levit_conv_128s.fb_dist_in1k 
onnx/models/levit_conv_192.fb_dist_in1k 
onnx/models/levit_conv_256.fb_dist_in1k 
onnx/models/levit_conv_384.fb_dist_in1k