onnx / onnx-mlir

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
Apache License 2.0
770 stars 321 forks source link

"Error: not ranked" on int8 model #2948

Open jinchenglee opened 2 months ago

jinchenglee commented 2 months ago

I know quantized model isn't supported yet. I would like to confirm this is the symptom due to Dequantize/Quantize operators?

The model I'm testing is ShuffleNet-v2-int8 from here.

Command line: onnx-mlir --mlir-pass-statistics --mlir-print-ir-after-all --EmitLLVMIR ~/shufflenet-v2-12-int8.onnx Error:

...
<f32>, tensor<ui8>) -> tensor<1x1024xui8>
    %454 = "onnx.QLinearMatMul"(%453, %257, %15, %258, %259, %260, %262, %261) {onnx_node_name = "Gemm_260_MatMul_quant"} : (tensor<1x1024xui8>, tensor<f32>, tensor<ui8>, tensor<1024x1000xi8>, tensor<f32>, tensor<i8>, tensor<f32>, tensor<ui8>) -> tensor<1x1000xui8>
    %455 = "onnx.Custom"(%454, %262, %261, %265, %266, %267, %264, %263) {domain_name = "com.microsoft", function_name = "QLinearAdd", onnx_node_name = "Gemm_260_Add_quant"} : (tensor<1x1000xui8>, tensor<f32>, tensor<ui8>, tensor<1000xui8>, tensor<f32>, tensor<ui8>, tensor<f32>, tensor<ui8>) -> tensor<*xui8>
    %456 = "onnx.DequantizeLinear"(%455, %264, %263) {axis = 1 : si64, onnx_node_name = "output_DequantizeLinear"} : (tensor<*xui8>, tensor<f32>, tensor<ui8>) -> tensor<1x1000xf32>
    return %456 : tensor<1x1000xf32>
  }
  "onnx.EntryPoint"() {func = @main_graph} : () -> ()
}

loc("output_DequantizeLinear"): error: not ranked
// -----// IR Dump After {anonymous}::ONNXPreKrnlVerifyPass Failed (onnx-pre-krnl-verify) //----- //
func.func @main_graph(%arg0: tensor<1x3x224x224xf32> {onnx.name = "input"}) -> (tensor<1x1000xf32> {onnx.name = "output"}) {
  %0 = onnx.Constant dense<232> : tensor<2xi64>
...
===-------------------------------------------------------------------------===
                         ... Pass statistics report ...
===-------------------------------------------------------------------------===
'func.func' Pipeline
  {anonymous}::DecomposeONNXToONNXPass
  {anonymous}::RecomposeONNXToONNXPass
  {anonymous}::ONNXHybridTransformPass
  {anonymous}::ConvOptONNXToONNXPass
  {anonymous}::ONNXHybridTransformPass
{anonymous}::SimplifyShapeRelatedOpsPass
'func.func' Pipeline
  {anonymous}::ONNXHybridTransformPass
  onnx_mlir::{anonymous}::StandardFuncReturnPass
SymbolDCE
  (S) 0 num-dce'd - Number of symbols DCE'd
onnx_mlir::{anonymous}::ScrubDisposablePass
{anonymous}::SetONNXNodeNamePass
'func.func' Pipeline
  onnx_mlir::InstrumentPass
CSE
  (S) 298 num-cse'd - Number of operations CSE'd
  (S)   0 num-dce'd - Number of operations DCE'd
'func.func' Pipeline
  {anonymous}::ONNXPreKrnlVerifyPass                       <= Error'ed in this pass. 
onnx_mlir::FrontendToKrnlLoweringPass
Canonicalizer
'func.func' Pipeline
  onnx_mlir::krnl::ConvertKrnlToAffinePass
CSE
  (S) 0 num-cse'd - Number of operations CSE'd
  (S) 0 num-dce'd - Number of operations DCE'd
'func.func' Pipeline
  ConvertVectorToSCF
ConvertAffineToStandard
AlexandreEichenberger commented 2 months ago

I think the likely source of the error is the use of a custom operation for which the output is unshaped.

    %455 = "onnx.Custom"(%454, %262, %261, %265, %266, %267, %264, %263) {domain_name = "com.microsoft", function_name = "QLinearAdd", onnx_node_name = "Gemm_260_Add_quant"} : (tensor<1x1000xui8>, tensor<f32>, tensor<ui8>, tensor<1000xui8>, tensor<f32>, tensor<ui8>, tensor<f32>, tensor<ui8>) -> tensor<*xui8>

ONNX MLIR needs to know the output shape. In this specific case, since it's all static, I suspect that filling in the output shape would enable it to work further down. We don't have support for the QLinearAdd as it is not an official op but a MS extension. So it would choke later when attempting to lower that operation.

Best would be to recommend ONNX to add that operation and then we could implement it as part of the standard.

jinchenglee commented 2 months ago

Thank you for explanation. Since custom operator is an officially supported operator in ONNX spec, it seems a good practice to add some level of support in onnx-mlir? Of course, it won't be able to generate any runnable-code since it is unknow the actual function of the customized op. But if the output shape is already there (static model), at least some layers of passes should still work?