Checking ONNX Zoo accuracy

gyulaz-htec commented 8 months ago

Check the following:

[x] Baseline accuracy with ORT
[x] Accuracy with ORT using MIGRaphX provider
[x] MIGRaphX ref accuracy
[x] MIGRaphX gpu accuracy

gyulaz-htec commented 8 months ago

GPU accuracy

Accuracy tables are available: https://github.com/gyulaz-htec/models/blob/migraphx_testing/MIGRAPHX_fp32.md https://github.com/gyulaz-htec/models/blob/migraphx_testing/MIGRAPHX_fp32-int8.md https://github.com/gyulaz-htec/models/blob/migraphx_testing/MIGRAPHX_fp32-qdq.md https://github.com/gyulaz-htec/models/blob/migraphx_testing/MIGRAPHX_fp16.md https://github.com/gyulaz-htec/models/blob/migraphx_testing/MIGRAPHX_FP32_FP16.md

Migraphx version: 352dcea2c6a03c495a6ba8667e19811bc5d1399b

gyulaz-htec commented 8 months ago

Baseline accuracy with ORT

Only 1 model fails: [FAIL] bertsquad-8.tar.gz: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Gather node. Name:'bert/embeddings/one_hot' Status Message: indices element out of data bounds, idx=7 must be within the inclusive range [-2,1]

gyulaz-htec commented 8 months ago

REF accuracy

Accuracy tables are available: https://github.com/gyulaz-htec/models/blob/ref_testing/MIGRAPHX_ref_fp32.md https://github.com/gyulaz-htec/models/blob/ref_testing/MIGRAPHX_ref_fp32-int8.md https://github.com/gyulaz-htec/models/blob/ref_testing/MIGRAPHX_ref_fp32-qdq.md https://github.com/gyulaz-htec/models/blob/ref_testing/MIGRAPHX_ref_fp16.md https://github.com/gyulaz-htec/models/blob/ref_testing/MIGRAPHX_ref_FP32_FP16.md

Migraphx version: 352dcea2c6a03c495a6ba8667e19811bc5d1399b

gyulaz-htec commented 8 months ago

Accuracy with ORT using MIGRaphX provider

105 passing model
66 skipped models
13 failing models

Full log available here

Skipped

9 crashing models:

fcn-resnet101-11.onnx
fcn-resnet50-12.onnx
fcn-resnet50-12-int8.onnx
fcn-resnet50-11.onnx
fcn-resnet50-12-qdq.onnx
ssd_mobilenet_v1_10.onnx
ssd_mobilenet_v1_12-int8.onnx
ssd_mobilenet_v1_12.onnx
ssd_mobilenet_v1_13-qdq.onnx

57 models are not supported by ORT (old opset, qdq models)

Fails

3 invalid I/O:

[FAIL] roberta-sequence-classification-9.tar.gz: Error -3 while decompressing data: invalid code lengths set
[FAIL] bertsquad-12.tar.gz: Error -3 while decompressing data: invalid code lengths set
[FAIL] ssd-12.tar.gz: operands could not be broadcast together with shapes (1,97,4) (1,200,4)

2 accuracy issue:

[FAIL] bertsquad-8.tar.gz: FAILED due to output mismatch.
[FAIL] bertsquad-10.tar.gz: FAILED due to output mismatch.

8 run issue:

[FAIL] bidaf-9.tar.gz: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running MGXKernel_graph_CNTKGraph_13637492904328163374_4 node. Name:'MIGraphXExecutionProvider_MGXKernel_graph_CNTKGraph_13637492904328163374_4_4' Status Message: Failed to call function
[FAIL] resnet-preproc-v1-18.tar.gz: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. In Node, ("", Loop, "", -1) : ("_inlfunc_SequenceMap_SequenceMap_input_sequence_seqlen": tensor(int64),"_inlfunc_SequenceMap_SequenceMap_input_sequence_cond": tensor(bool),"_inlfunc_SequenceMap_SequenceMap_out_sequence_0_seqempty": seq(tensor(float)),) -> ("_inlfunc_preprocess_tmp_seq": seq(tensor(float)),) , Error Nodes in a graph must be topologically sorted, however input '_inlfunc_CenterCropPad_padded_input' of node:
[FAIL] mnist-12-int8.tar.gz: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: Failed to call function
[FAIL] emotion-ferplus-12-int8.tar.gz: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: Failed to call function
[FAIL] arcfaceresnet100-11-int8.tar.gz: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: Failed to call function
[FAIL] yolov3-10.tar.gz: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running MGXKernel_graph_model_1_7239056435708746026_6 node. Name:'MIGraphXExecutionProvider_MGXKernel_graph_model_1_7239056435708746026_6_6' Status Message: Failed to call function
[FAIL] yolov3-12.tar.gz: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running MGXKernel_graph_model_1_17013477727282915303_6 node. Name:'MIGraphXExecutionProvider_MGXKernel_graph_model_1_17013477727282915303_6_6' Status Message: Failed to call function
[FAIL] tiny-yolov3-11.tar.gz: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running MGXKernel_graph_model_1_5126557510845220766_4 node. Name:'MIGraphXExecutionProvider_MGXKernel_graph_model_1_5126557510845220766_4_4' Status Message: Failed to call function

attila-dusnoki-htec commented 8 months ago

Accuracy with ORT using MIGRaphX provider

@gyulaz-htec not sure about you result. I checked and had a very different result, which matched the original migraphx results. Are you sure you used the same migraphx version? I suspect you tested with a release version, not the same develop version.

Full log: ort_mgx.log

Summary:

[FAIL] roberta-sequence-classification-9.tar.gz: 'list' object has no attribute 'dtype'

MIGraphX Error: /code/AMDMIGraphX/src/common.cpp:83: operator(): COMPUTE_BROADCASTED_DYN_DIMS: dynamic shapes {[ 46, 46, {} ], [ 1, 1, {} ], [ 10, 10, {} ]} and {[ 0, 18446744073709551615, {} ], [ 0, 18446744073709551615, {} ], [ 10, 10, {} ]} mismatch!
[FAIL] bidaf-9.tar.gz: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running MGXKernel_graph_CNTKGraph_10970869655559385371_4 node. Name:'MIGraphXExecutionProvider_MGXKernel_graph_CNTKGraph_10970869655559385371_4_4' Status Message: Failed to call function

[FAIL] bertsquad-8.tar.gz: Required inputs (['unique_ids_raw_output___9:0', 'segment_ids:0', 'input_mask:0', 'input_ids:0']) are missing from input feed (['input1', 'input2', 'input4', 'input3']).

MIGraphX Error: /code/AMDMIGraphX/src/onnx/parse_qlinearmatmul.cpp:143: check_inputs: QLINEARMATMUL: unsupported row/column quantization
[FAIL] mnist-12-int8.tar.gz: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: Failed to call function

[FAIL] resnet-preproc-v1-18.tar.gz: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. In Node, ("", Loop, "", -1) : ("_inlfunc_SequenceMap_SequenceMap_input_sequence_seqlen": tensor(int64),"_inlfunc_SequenceMap_SequenceMap_input_sequence_cond": tensor(bool),"_inlfunc_SequenceMap_SequenceMap_out_sequence_0_seqempty": seq(tensor(float)),) -> ("_inlfunc_preprocess_tmp_seq": seq(tensor(float)),) , Error Nodes in a graph must be topologically sorted, however input '_inlfunc_CenterCropPad_padded_input' of node: 

[FAIL] ssd-12.tar.gz: operands could not be broadcast together with shapes (1,97) (1,200) 

MIGraphX Error: /code/AMDMIGraphX/src/onnx/onnx_parser.cpp:419: parse_graph: Unknown operator: MGXKernel_subgraph_tf2onnx_2626310776439466313_4
[FAIL] yolov3-10.tar.gz: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running MGXKernel_graph_model_1_2626310776439466313_6 node. Name:'MIGraphXExecutionProvider_MGXKernel_graph_model_1_2626310776439466313_6_6' Status Message: Failed to call function

MIGraphX Error: /code/AMDMIGraphX/src/onnx/onnx_parser.cpp:419: parse_graph: Unknown operator: MGXKernel_subgraph_tf2onnx_7626062778491419067_4
[FAIL] yolov3-12.tar.gz: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running MGXKernel_graph_model_1_7626062778491419067_6 node. Name:'MIGraphXExecutionProvider_MGXKernel_graph_model_1_7626062778491419067_6_6' Status Message: Failed to call function

MIGraphX Error: /code/AMDMIGraphX/src/onnx/onnx_parser.cpp:419: parse_graph: Unknown operator: MGXKernel_subgraph_tf2onnx_8794048869124499261_3
[FAIL] tiny-yolov3-11.tar.gz: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running MGXKernel_graph_model_1_8794048869124499261_4 node. Name:'MIGraphXExecutionProvider_MGXKernel_graph_model_1_8794048869124499261_4_4' Status Message: Failed to call function

MIGraphX Error: /code/AMDMIGraphX/src/onnx/parse_qlinearmatmul.cpp:143: check_inputs: QLINEARMATMUL: unsupported row/column quantization
[FAIL] arcfaceresnet100-11-int8.tar.gz: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: Failed to call function

MIGraphX Error: /code/AMDMIGraphX/src/onnx/parse_qlinearmatmul.cpp:143: check_inputs: QLINEARMATMUL: unsupported row/column quantization
[FAIL] emotion-ferplus-12-int8.tar.gz: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: Failed to call function

[FAIL] bertsquad-12.tar.gz: FAILED due to output mismatch.
[FAIL] bertsquad-10.tar.gz: FAILED due to output mismatch.
[FAIL] shufflenet-9.tar.gz: FAILED due to output mismatch.
[FAIL] shufflenet-8.tar.gz: FAILED due to output mismatch.
[FAIL] shufflenet-v2-12.tar.gz: FAILED due to output mismatch.
[FAIL] shufflenet-v2-10.tar.gz: FAILED due to output mismatch.
[FAIL] shufflenet-7.tar.gz: FAILED due to output mismatch.
[FAIL] ssd-10.tar.gz: FAILED due to output mismatch.
[FAIL] yolov4.tar.gz: FAILED due to output mismatch.
[FAIL] ResNet101-DUC-7.tar.gz: FAILED due to output mismatch.
[FAIL] ResNet101-DUC-12.tar.gz: FAILED due to output mismatch.

attila-dusnoki-htec commented 8 months ago

Computer_Vision model verify results:

fp32_accuracy_check.log

fp16_accuracy_check.log

attila-dusnoki-htec commented 7 months ago

Here is a script, which can run the same accuracy test, but with migraphx's test_runner.py

migraphx-benchmark / AMDMIGraphX