Closed negiyas closed 1 year ago
It seems that the imported onnx.Scan op is not correct.
The newron.app outputs ( https://netron.app/ ) of an onnx.Scan op ("Scan_20") are as follows.
I generated a mlir file by ./build/Debug/bin/onnx-mlir bidaf-9.onnx --EmitONNXBasic
The imported onnx.Scan op in the generated bidaf-9.onnx.mlir file is as follows.
In the generated mlir, there are some differences.
%302:5 = "onnx.Scan"(%299, %300, %301, %298) ({
^bb0(%arg4: tensor<1x1xf32>, %arg5: tensor<1x1xf32>, %arg6: tensor<1xf32>, %arg7: tensor<1x1xf32>):
%316 = "onnx.Greater"(%arg7, %arg4) {onnx_node_name = "start_max_Greater_12"} : (tensor<1x1xf32>, tensor<1x1xf32>) -> tensor<*xi1>
%317 = "onnx.Where"(%316, %arg7, %arg4) {onnx_node_name = "start_max_Where_13"} : (tensor<*xi1>, tensor<1x1xf32>, tensor<1x1xf32>) -> tensor<*xf32>
%318 = "onnx.Where"(%316, %arg6, %arg5) {onnx_node_name = "start_max_Where_14"} : (tensor<*xi1>, tensor<1xf32>, tensor<1x1xf32>) -> tensor<*xf32>
%319 = "onnx.Constant"() {value = dense<1.000000e+00> : tensor<1xf32>} : () -> tensor<1xf32>
%320 = "onnx.Add"(%arg6, %319) {onnx_node_name = "start_max_Add_15"} : (tensor<1xf32>, tensor<1xf32>) -> tensor<1xf32>
%321 = "onnx.Identity"(%317) {onnx_node_name = "start_max_Identity_16"} : (tensor<*xf32>) -> tensor<1x1xf32>
%322 = "onnx.Identity"(%318) {onnx_node_name = "start_max_Identity_17"} : (tensor<*xf32>) -> tensor<1x1xf32>
%323 = "onnx.Identity"(%317) {onnx_node_name = "start_max_Identity_18"} : (tensor<*xf32>) -> tensor<1x1xf32>
%324 = "onnx.Identity"(%318) {onnx_node_name = "start_max_Identity_19"} : (tensor<*xf32>) -> tensor<1x1xf32>
onnx.Return %321, %322, %320, %323, %324 : tensor<1x1xf32>, tensor<1x1xf32>, tensor<1xf32>, tensor<1x1xf32>, tensor<1x1xf32>
}) {input_names = ["start_max__v_subgraph", "start_max__i_subgraph", "start_max__counter_subgraph", "Log11393_Output_0_subgraph"], num_scan_inputs = 1 : si64, onnx_node_name = "Scan_20", output_names = ["start_max__v", "start_max__i", "start_max__counter", "start_max_value", "start_max_index"], scan_input_directions = [0], scan_output_directions = [0, 0]} : (tensor<1x1xf32>, tensor<1x1xf32>, tensor<1xf32>, tensor<*xf32>) -> (tensor<1x1xf32>, tensor<1x1xf32>, tensor<1xf32>, tensor<1x1xf32>, tensor<1x1xf32>)
Results of other test results those may be related to the bidaf-9 compilation issue.
Look at the attached chart for the detail.
Found the following facts with investigation of the Bidaf-9 model, opsets of onnx.Scan, and onnx-mlir.
Bidaf-9 uses opset-8 of onnx.Scan op with dynamic shape (dynamic batch size).
Onnx-mlir supports opset-9 and 11 with static shape.
Specification and functions of opset-8 and 9 are not compatible. => Onnx-mlir cannot support the Bidaf-9 model.
The bidaf-9 model can be supported by enabling opset-8 of onnx.Scan by onnx-mlir, because all Bidaf-9 issues probably come from the opset-8 and 9 difference.
Closing this issue because onnx.Scan can be supported by onnx-mlir.
The latest main branch of onnx-mlir cannot compile bidaf-9.onnx in the model zoo. The following errors occur while compilation.
The visualized results by https://netron.app/ is as follows.
The error comes from onnx.Suqeeze op, according to the shape inference results, the input rank is 2, so that onnx.squeeze with axes=2 is invalid (should be less than 2).
It seems that the mismatch comes from the previous onnx.Scan op in the graph. The output dimension of the onnx.Scan op should be "cx1x1" in the graph, but the rank by the shape inference is 2. It is not consistent. (The input dimension of the onnx.Scan op is "cx1x1" in the graph, and the rank by the shape inference is 3. It is consistent.)