onnx / onnx-mlir

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
Apache License 2.0
743 stars 315 forks source link

Error in loading ConstantOp #1975

Open chentong319 opened 1 year ago

chentong319 commented 1 year ago

Here is the error I mentioned at the meeting. I run onnx-mlir --EmitONNXBasic my_model.onnx' and thenonnx-mlir --EmitONNXIR my_model.onnx.mlir`. I got error message

error: unknown attribute `dense_disposable` in dialect `ONNX`

This error occurs only on one model, which can not be shared publicly. The constant in my_model.onnx.mlir looks like this

%0 = "onnx.Constant"() {value = #onnx.dense_disposable<#1:"0x000000000000000000000000000000000000000000000000"> : tensor<3xi64>} : () -> tensor<3xi64>

The constant is not printed as dense format, while the constant of other model (output of --EmitONNXBasic) is in dense format and can be loaded back to onnx-mlir. Can you figure out the reason without the model?

By the way, I'd like to explain the question I asked at meeting. We can import a model from .onnx file and the constant is represented with the disposable attribute for performance. The disposable attribute is converted to dense attribute before onnx to KRNL conversion. However, we may print out the IR, and then run onnx-mlir again. To my understanding, the constant will be print as dense, not the disposable. Right? Can we print out the disposible attribute? When onnx-mlir load the print out of dense attribute again, do we need add a pass to convert the dense Attribute to disposable? Or has it been done automatically?

sorenlassen commented 1 year ago

thanks for the error report!

I'll first try to figure out why it was printed out as #onnx.dense_disposable<..> and not dense<..> and then I'll try to answer the other questions

sorenlassen commented 1 year ago

I can't recreate the error you described and I'm puzzled how you got it to print "onnx.Constant" with double quotes - it must have bypassed ONNXConstantOp::print() from OnnxOps.cpp which I can't see how can happen

studying the implementation I found another corner case where #onnx.dense_disposable might get printed, namely if the DisposableElementsAttr type doesn't match the ONNXConstantOp type, which would probably be impossible to parse - I'll try to verify and fix that

chentong319 commented 1 year ago

I looked into the model dump. The constant is defined as following:

  initializer {
    dims: 1
    data_type: 7
    name: "const_axes__1974"
    raw_data: "\377\377\377\377\377\377\377\377"
  } 

The resulted onnx Op is

    %17 = "onnx.Constant"() {value = #onnx.dense_disposable<#18:"0xFFFFFFFFFFFFFFFF"> : tensor<1xi64>} : () -> tensor<1xi64>

This constant should be -1. I checked another model with the same initializer definition (dims, type and raw data).

  initializer {
    dims: 1 
    data_type: 7
    name: "_Const_0"
    raw_data: "\377\377\377\377\377\377\377\377"
  }

That model got

%53 = onnx.Constant dense<-1> : tensor<1xi64>

I will trace with gdb to see what happened.

chentong319 commented 1 year ago

The error did not come from ONNXConstantOp::print(OpAsmPrinter &odsPrinter), which is not invoked for the model with error. The attribute print is called by OperationPrinter::printGenericOp. So the DisposableElementsAttr is printed as an Attribute directly from mlir::AsmPrinter::Impl::printNamedAttribute. I do not know why. What's the Op? Is the model correct though it passed onnx_check? A possible improvement is to define the print method for DisposableElementsAttr, instead of intercepting the print at the print of ONNXConstantOp.

chentong319 commented 1 year ago

That model has problem. It can not pass verifier. The mlir dump the IR with printGenericOp. No error hint is generated byonnx-mlir --EmitONNXBasic

sorenlassen commented 1 year ago

@chentong319 thanks for troubleshooting this!

A possible improvement is to define the print method for DisposableElementsAttr, instead of intercepting the print at the print of ONNXConstantOp.

with the print method for DisposableElementsAttr I couldn't get rid of the #onnx. prefix, which is added by AsmPrinter, which is why I intercepted the print at the print of the ONNXConstantOp where I could make it print the same as DenseElementsAttr

negiyas commented 1 year ago

Let me add a similar error case in onnx.ConstantOpShape op in bidaf-9.onnx ( https://github.com/onnx/models/blob/main/text/machine_comprehension/bidirectional_attention_flow/model/bidaf-9.onnx ) in the model zoo with the latest onnx-mlir ( 66b0e2be9bb39a084cc080a96852b3a9b99a8bd9 ) as follows.

$ onnx-mlir  --EmitONNXBasic  bidaf-9.onnx
$ mv bidaf-9.tmp  bidaf-9-onnxbasic.mlir
$ onnx-mlir-opt --shape-inference bidaf-9-onnxbasic.mlir
bidaf-9-onnxbasic.mlir:114:132: error: unknown attribute `dense_disposable` in dialect `ONNX`
    %108 = "onnx.ConstantOfShape"(%107) {onnx_node_name = "UnpackSequenceOp10564_constant_of_shape", value = #onnx.dense_disposable<#56:"0x3F800000"> : tensor<1xf32>} : (tensor<*xi64>) -> tensor<*xf32>
                                                                                                                                   ^
sorenlassen commented 1 year ago

thanks for sharing this example, I'm able to reproduce and I'm working on a fix

sorenlassen commented 1 year ago

this issue should be fixed with PR #2020

please reopen if you see more problems

tungld commented 1 year ago

Even when using the fix PR #2020, I still see this issue with bertsquad-12-int8:

$ ./Debug/bin/onnx-mlir --EmitONNXBasic bertsquad-12-int8.onnx
$./Debug/bin/onnx-mlir-opt --print-op-stats bertsquad-12-int8.tmp  >/dev/null
bertsquad-12-int8.tmp:4:59: error: unknown attribute `dense_disposable` in dialect `ONNX`
   %0 = "onnx.Constant"() {value = #onnx.dense_disposable<#1:"0x00000100"> : tensor<i32>} : () -> tensor<i32>
sorenlassen commented 1 year ago

I still see this issue with bertsquad-12-int8

in both the .tmp and .onnx.mlir output all the constants are printed with the "generic" syntax with double quotes:

"builtin.module"() ({
  "func.func"() ({
  ^bb0(%arg0: tensor<?xi64>, %arg1: tensor<?x256xi64>, %arg2: tensor<?x256xi64>, %arg3: tensor<?x256xi64>):
    %0 = "onnx.Constant"() {value = #onnx.dense_disposable<#1:"0x00010000"> : tensor<i32>} : () -> tensor<i32>
    %1 = "onnx.Constant"() {value = #onnx.dense_disposable<#2:"0x0000000000000000"> : tensor<1xi64>} : () -> tensor<1xi64>
    %2 = "onnx.Constant"() {value = #onnx.dense_disposable<#3:"0x0100000000000000"> : tensor<1xi64>} : () -> tensor<1xi64>

so, somehow, the custom assembly printing in ONNXConstantOp::print() is somehow bypassed, the same with ConstantOfShape:

%719 = "onnx.ConstantOfShape"(%718) {onnx_node_name = "bert/encoder/ones", value = #onnx.dense_disposable<#634:"0x0000803F"> : tensor<1xf32>} : (tensor<*xi64>) -> tensor<*xf32>

any ideas why this could be happening?

if doesn't happen with bidaf-9.onnx where we output:

module attributes {llvm.data_layout = "e-m:o-i64:64-i128:128-n32:64-S128", llvm.target_triple = "arm64-apple-darwin21.6.0"} {
  func.func @main_graph(%arg0: tensor<?x1x!onnx.String>, %arg1: tensor<?x1x1x16x!onnx.String>, %arg2: tensor<?x1x!onnx.String>, %arg3: tensor<?x1x1x16x!onnx.String>) -> (tensor<1xi32>, tensor<1xi32>) attributes {input_names = ["context_word", "context_char", "query_word", "query_char"], output_names = ["start_pos", "end_pos"]} {
    %0 = onnx.Constant dense<"0x10CC463EFD..."> : tensor<1000x1xf32>
sorenlassen commented 1 year ago

as discussed in today's meeting, I will add parsing logic for DisposableElementsAttr, which seems like the most direct way to get things working again even if Constant and ConstantOfShape are printed "generically" - I will try to complete it this week

I welcome any ideas why it's printed generically and bypassing our custom assembly printer

sorenlassen commented 1 year ago

I added parsing logic for DisposableElementsAttr in PR #2031, please review when you can find time

with this fix when I ran @tungld's example onnx-mlir-opt --print-op-stats bertsquad-12-int8.tmp I got the error output:

bertsquad-12-int8.tmp:639:14: error: 'onnx.DynamicQuantizeLinear' op result #0 must be tensor of 8-bit unsigned integer values, but got 'tensor<*xi8>'
    %635:3 = "onnx.DynamicQuantizeLinear"(%634) {onnx_node_name = "bert/embeddings/one_hot:0_QuantizeLinear"} : (tensor<*xf32>) -> (tensor<*xi8>, tensor<*xf32>, tensor<*xi8>)
             ^
bertsquad-12-int8.tmp:639:14: note: see current operation: %635:3 = "onnx.DynamicQuantizeLinear"(%634) {onnx_node_name = "bert/embeddings/one_hot:0_QuantizeLinear"} : (tensor<*xf32>) -> (tensor<*xi8>, tensor<*xf32>, tensor<*xi8>)

which is similar to the situation with @chentong319's original "my_model.onnx" example

maybe that's why these examples print out generically: they experience a verifier error and trigger an exceptional code path that prints the mlir generically

are we failing to handle an error?

tungld commented 1 year ago

It looks like the frontend wrongly imported onnx.DynamicQuantizeLinear by using i8 instead of ui8. Looking at the definition of onnx.DynamicQuantizeLinear in onnx-mlir, the first and third outputs must use ui8 and its type map is

static std::vector<int> getTypeMap() {
      return {1,7,1};
    }

In FrontendDialectTransformer.cpp, the frontend will use the direct type for the output:

// Mapping gives a direct type.
Type elementType = buildTypeFromIndex(outputMap[j]);
auto outType = UnrankedTensorType::get(elementType);

But buildTypeFromIndex does not produce any unsigned integer, and case 1 returns i8:

// itblgen_types = ('I1', 'I8', 'I16', 'I32', 'I64', 'BF16', 'F16', 'F32',
  // 'F64', 'Complex<F32>', 'Complex<F64>' )
  Type buildTypeFromIndex(int index) {
    switch (index) {
    case 0:
      return builder_.getI1Type();
    case 1:
      return builder_.getIntegerType(8);
    case 2:
      return builder_.getIntegerType(16);
    case 3:
      return builder_.getIntegerType(32);
    case 4:
      return builder_.getIntegerType(64);
    case 5:
      return builder_.getBF16Type();
    case 6:
      return builder_.getF16Type();
    case 7:
      return builder_.getF32Type();
    case 8:
      return builder_.getF64Type();
    case 9: {
      std::vector<Type> typeTuple(2);
      typeTuple.push_back(builder_.getF32Type());
      typeTuple.push_back(builder_.getF32Type());
      return builder_.getTupleType(llvm::ArrayRef<Type>(typeTuple));
    }
    case 10: {
      std::vector<Type> typeTuple(2);
      typeTuple.push_back(builder_.getF64Type());
      typeTuple.push_back(builder_.getF64Type());
      return builder_.getTupleType(llvm::ArrayRef<Type>(typeTuple));
    }
    case 11:
      return mlir::ONNXStringType::get(builder_.getContext());
    default:
      assert(false && "Unsupported type index encountered.");
      return nullptr;
    }
  }
tungld commented 1 year ago

This PR #2067 fixes the ui8 issue in bertsquad-12-int8. With #2067, the model is imported correctly and there is no onnx.dense_disposable<> but normal dense<> for ElementsAttrs.