nod-ai / SHARK-TestSuite

Temporary home of a test suite we are evaluating
Apache License 2.0
5 stars 35 forks source link

MiGraphx CPU/GPU Status Tracking #325

Open zjgarvey opened 3 months ago

zjgarvey commented 3 months ago

This issue will be used to track compilation failures for migraphx models on CPU and GPU. Compile failures for each model should have a link to an issue with a smaller reproducer in the notes column.

Notes:

  1. migraphx_ORT__bert_base_cased_1 fails on CPU but passes on GPU. Other adjacent models fail for similar reasons on both. Very odd.
  2. not including tests migraphx_sdxl__unet__model, migraphx_ORT__bert_large_uncased_1 because they cause a crash (likely OOM)
  3. not including any of the tf models yet.

CPU Status Table

The Following report was generated with IREE compiler version iree-org/iree@caacf6c8015b4344b2d9b4a82c2fddc015693831 Torch-mlir version llvm/torch-mlir@2665ed343b19713ba5c1c555b2366a93de8b9d2b

Passing Summary

TOTAL TESTS = 30 Stage # Passing % of Total % of Attempted
Setup 30 100.0% 100.0%
IREE Compilation 24 80.0% 80.0%
Gold Inference 22 73.3% 91.7%
IREE Inference Invocation 19 63.3% 86.4%
Inference Comparison (PASS) 15 50.0% 78.9%

Fail Summary

TOTAL TESTS = 30 Stage # Failed at Stage % of Total
Setup 0 0.0%
IREE Compilation 6 20.0%
Gold Inference 2 6.7%
IREE Inference Invocation 3 10.0%
Inference Comparison 4 13.3%

Test Run Detail

Test was run with the following arguments: Namespace(device='local-task', backend='llvm-cpu', iree_compile_args=None, mode='cl-onnx-iree', torchtolinalg=True, stages=None, skip_stages=None, benchmark=False, load_inputs=False, groups='all', test_filter='migraphx', testsfile=None, tolerance=None, verbose=True, rundirectory='test-run', no_artifacts=False, cleanup='0', report=True, report_file='mi_10_10.md')

Test Exit Status Mean Benchmark Time (ms) Notes
migraphx_agentmodel__AgentModel compilation None iree-18268 iree-18412 torch-mlir-3651
migraphx_bert__bert-large-uncased preprocessing None
migraphx_bert__bertsquad-12 Numerics None
migraphx_cadene__dpn92i1 PASS None
migraphx_cadene__inceptionv4i16 PASS None
migraphx_cadene__resnext101_64x4di1 PASS None
migraphx_cadene__resnext101_64x4di16 PASS None
migraphx_huggingface-transformers__bert_mrpc8 native_inference None
migraphx_mlperf__bert_large_mlperf Numerics None
migraphx_mlperf__resnet50_v1 PASS None
migraphx_models__whisper-tiny-decoder compiled_inference None
migraphx_models__whisper-tiny-encoder native_inference None
migraphx_onnx-misc__taau_low_res_downsample_d2s_for_infer_time_fp16_opset11 import_model None
migraphx_onnx-model-zoo__gpt2-10 preprocessing None
migraphx_ORT__bert_base_cased_1 PASS None
migraphx_ORT__bert_base_uncased_1 PASS None
migraphx_ORT__bert_large_uncased_1 PASS None
migraphx_ORT__distilgpt2_1 compiled_inference None
migraphx_ORT__onnx_models__bert_base_cased_1_fp16_gpu Numerics None
migraphx_ORT__onnx_models__bert_large_uncased_1_fp16_gpu Numerics None
migraphx_ORT__onnx_models__distilgpt2_1_fp16_gpu compiled_inference None
migraphx_pytorch-examples__wlang_gru PASS None
migraphx_pytorch-examples__wlang_lstm PASS None
migraphx_sdunetmodel import_model None
migraphx_sdxlunetmodel import_model None
migraphx_torchvision__densenet121i32 PASS None
migraphx_torchvision__inceptioni1 PASS None
migraphx_torchvision__inceptioni32 PASS None
migraphx_torchvision__resnet50i1 PASS None
migraphx_torchvision__resnet50i64 PASS None

OLD STATUS (Will update and migrate issues to current table)

Test Exit Status Notes
migraphx_agentmodel__AgentModel compilation
migraphx_bert__bert-large-uncased compilation iree-18269 Two IR reported under this, depicting different behavior
migraphx_bert__bertsquad-12 compilation iree-18267 torch-mlir-3647
migraphx_cadene__dpn92i1 PASS
migraphx_cadene__inceptionv4i16 PASS
migraphx_cadene__resnext101_64x4di1 PASS
migraphx_cadene__resnext101_64x4di16 PASS
migraphx_huggingface-transformers__bert_mrpc8 compilation iree-18413
migraphx_mlperf__bert_large_mlperf compilation iree-18297
migraphx_mlperf__resnet50_v1 PASS
migraphx_models__whisper-tiny-decoder compilation torch-mlir-3647
migraphx_models__whisper-tiny-encoder compilation torch-mlir-3647
migraphx_onnx-misc__taau_low_res_downsample_d2s_for_infer_time_fp16_opset11 construct_inputs ORT issue with resize with f16 inputs?
migraphx_onnx-model-zoo__gpt2-10 compilation shark-turbine-465 torch-mlir-615 torch-mlir-3293
migraphx_ORT__bert_base_cased_1 Numerics Passed when '--iree-input-demote-i64-to-i32' is not present iree-18273
migraphx_ORT__bert_base_uncased_1 Numerics Passed when '--iree-input-demote-i64-to-i32' is not present
migraphx_ORT__bert_large_uncased_1 compilation crashes "MatMul" fail to legalize stream.cmd.dispatch https://github.com/iree-org/iree/issues/18229 https://github.com/llvm/torch-mlir/issues/3647 ??
migraphx_ORT__distilgpt2_1 Numerics
migraphx_ORT__onnx_models__bert_base_cased_1_fp16_gpu Numerics
migraphx_ORT__onnx_models__bert_large_uncased_1_fp16_gpu Numerics
migraphx_ORT__onnx_models__distilgpt2_1_fp16_gpu Numerics
migraphx_pytorch-examples__wlang_gru Numerics iree-18441
migraphx_pytorch-examples__wlang_lstm Numerics iree-18441
migraphx_sdunetmodel import_model Killed during MLIR import. Too big??
migraphx_sdxlunetmodel import_model Killed during MLIR import. Too big??
migraphx_torchvision__densenet121i32 PASS
migraphx_torchvision__inceptioni1 PASS
migraphx_torchvision__inceptioni32 PASS
migraphx_torchvision__resnet50i1 PASS
migraphx_torchvision__resnet50i64 PASS

GPU Status Table

last generated with pip installed iree tools at version

iree-compiler      20240903.1005
iree-runtime       20240903.1005

Summary

Stage Count
Total 21 (non-crashing, see table below)
PASS 12
Numerics 2
results-summary 0
postprocessing 0
compiled_inference up to 5 (not included in total) crash during this stage
compilation 4
preprocessing 0
import_model 1
native_inference 2
construct_inputs 0
setup 0

Test Run Detail

Test was run with the following arguments: Namespace(device='hip://1', backend='rocm', iree_compile_args=['iree-hip-target=gfx942'], mode='onnx-iree', torchtolinalg=False, stages=None, skip_stages=None, load_inputs=False, groups='all', test_filter='migraphx', tolerance=None, verbose=True, rundirectory='test-run', no_artifacts=False, report=True, report_file='9_3_migraphx.md')

Test Exit Status Notes
migraphx_agentmodel__AgentModel compilation related : https://github.com/llvm/torch-mlir/pull/3630
migraphx_bert__bert-large-uncased compilation operand return type issue (see CPU table)
migraphx_bert__bertsquad-12 compilation (without shape inference)/ compiled_inference 1. Failing to use shape inference torch-mlir passes in torch-to-iree pipeline gives an all dynamic squeeze-dim op. 2. If using torch-lower-to-backend-contract to get the shape information, this crashes during inference with OOB memory access
migraphx_cadene__dpn92i1 PASS
migraphx_cadene__inceptionv4i16 PASS
migraphx_cadene__resnext101_64x4di1 PASS
migraphx_cadene__resnext101_64x4di16 PASS
migraphx_huggingface-transformers__bert_mrpc8 native_inference
migraphx_mlperf__bert_large_mlperf native_inference
migraphx_mlperf__resnet50_v1 PASS
migraphx_onnx-misc__taau_low_res_downsample_d2s_for_infer_time_fp16_opset11 import_model
migraphx_onnx-model-zoo__gpt2-10 compilation https://github.com/nod-ai/SHARK-Turbine/issues/465 https://github.com/llvm/torch-mlir/issues/615 https://github.com/llvm/torch-mlir/issues/3293
migraphx_ORT__bert_base_cased_1 PASS
migraphx_ORT__bert_base_uncased_1 PASS
migraphx_ORT__distilgpt2_1 likely compiled_inference crashes with "Memory access fault by GPU node-3 (Agent handle: 0x5595fe450840) on address 0x7f1811a56000. Reason: Unknown."
migraphx_ORT__onnx_models__bert_base_cased_1_fp16_gpu compiled_inference causes a hard crash for trying to access memory out of bounds (Mi300x)
migraphx_ORT__onnx_models__bert_large_uncased_1_fp16_gpu compiled_inference same crash as above
migraphx_ORT__onnx_models__distilgpt2_1_fp16_gpu likely compiled_inference crashes with "Memory access fault by GPU node-3 (Agent handle: 0x5595fe450840) on address 0x7f1811a56000. Reason: Unknown."
migraphx_pytorch-examples__wlang_gru Numerics
migraphx_pytorch-examples__wlang_lstm Numerics
migraphx_torchvision__densenet121i32 PASS
migraphx_torchvision__inceptioni1 PASS
migraphx_torchvision__inceptioni32 PASS
migraphx_torchvision__resnet50i1 PASS
migraphx_torchvision__resnet50i64 PASS

Note: GPU missing sd model (runs out of memory and kills the test). Probably happening during native inference, so it might need some looking into.

Performance data with iree-benchmark-module on GPU

Summary

Stage Count
Total 30
PASS 13
Numerics 3
results-summary 0
postprocessing 0
benchmark 0
compiled_inference 2
native_inference 1
construct_inputs 0
compilation 8
preprocessing 0
import_model 3
setup 0

Test Run Detail

Test was run with the following arguments: Namespace(device='local-task', backend='llvm-cpu', iree_compile_args=None, mode='cl-onnx-iree', torchtolinalg=False, stages=None, skip_stages=None, benchmark=True, load_inputs=False, groups='all', test_filter='migraphx', testsfile=None, tolerance=None, verbose=True, rundirectory='test-run', no_artifacts=False, cleanup='0', report=True, report_file='report.md')

Test Exit Status Mean Benchmark Time (ms) Notes
migraphx_agentmodel__AgentModel compilation None
migraphx_bert__bert-large-uncased compilation None
migraphx_bert__bertsquad-12 compilation None
migraphx_cadene__dpn92i1 PASS 457.4397828740378
migraphx_cadene__inceptionv4i16 PASS 26072.668661984306
migraphx_cadene__resnext101_64x4di1 PASS 995.6825857516378
migraphx_cadene__resnext101_64x4di16 PASS 6324.309662605326
migraphx_huggingface-transformers__bert_mrpc8 compilation None
migraphx_mlperf__bert_large_mlperf PASS 8195.630943014596
migraphx_mlperf__resnet50_v1 PASS 219.81522629761858
migraphx_models__whisper-tiny-decoder compiled_inference None
migraphx_models__whisper-tiny-encoder native_inference None
migraphx_onnx-misc__taau_low_res_downsample_d2s_for_infer_time_fp16_opset11 import_model None
migraphx_onnx-model-zoo__gpt2-10 compilation None
migraphx_ORT__bert_base_cased_1 PASS 817.4834945239127
migraphx_ORT__bert_base_uncased_1 compilation None
migraphx_ORT__bert_large_uncased_1 PASS 2728.984761983156
migraphx_ORT__distilgpt2_1 compiled_inference None
migraphx_ORT__onnx_models__bert_base_cased_1_fp16_gpu Numerics 2141.3577783387154
migraphx_ORT__onnx_models__bert_large_uncased_1_fp16_gpu Numerics 6767.566671983029
migraphx_ORT__onnx_models__distilgpt2_1_fp16_gpu Numerics 101.96079453453422
migraphx_pytorch-examples__wlang_gru compilation None
migraphx_pytorch-examples__wlang_lstm compilation None
migraphx_sdunetmodel import_model None
migraphx_sdxlunetmodel import_model None
migraphx_torchvision__densenet121i32 PASS 2639.900082334255
migraphx_torchvision__inceptioni1 PASS 627.4162046611309
migraphx_torchvision__inceptioni32 PASS 22124.727455200627
migraphx_torchvision__resnet50i1 PASS 284.1490000589854
migraphx_torchvision__resnet50i64 PASS 11100.900294492021
nirvedhmeshram commented 3 months ago

@zjgarvey added https://github.com/llvm/torch-mlir/issues/3647 to some of the models as we need that along with https://github.com/iree-org/iree/issues/18229

MaheshRavishankar commented 3 months ago

cc @lialan as well. Can you co-ordinate with Zach to track CPU codegen issues.

nirvedhmeshram commented 3 months ago

Also adding https://github.com/llvm/torch-mlir/issues/3651 that needs to be done for supporting broad range of models.

zjgarvey commented 3 weeks ago

Updated benchmarks for static-dim bert tests on mi300:

Passing Summary

TOTAL TESTS = 18 Stage # Passing % of Total % of Attempted
Setup 18 100.0% 100.0%
IREE Compilation 18 100.0% 100.0%
Gold Inference 18 100.0% 100.0%
IREE Inference Invocation 18 100.0% 100.0%
Inference Comparison (PASS) 16 88.9% 88.9%

Fail Summary

TOTAL TESTS = 18 Stage # Failed at Stage % of Total
Setup 0 0.0%
IREE Compilation 0 0.0%
Gold Inference 0 0.0%
IREE Inference Invocation 0 0.0%
Inference Comparison 2 11.1%

Test Run Detail

Test was run with the following arguments: Namespace(device='hip://1', backend='rocm', iree_compile_args=['iree-hip-target=gfx942'], mode='cl-onnx-iree', torchtolinalg=False, stages=None, skip_stages=None, benchmark=True, load_inputs=False, groups='all', testfilter='migx', testsfile=None, tolerance=None, verbose=True, rundirectory='test-run', no_artifacts=False, cleanup='0', report=True, report_file='bert-bench-11-5.md', get_metadata=False)

Test Exit Status Mean Benchmark Time (ms) Notes
migx_bench_bert-large-uncased_16_128 PASS 31.207363539631814
migx_bench_bert-large-uncased_16_256 PASS 55.50303652834816
migx_bench_bert-large-uncased_16_384 Numerics 73.14148765678208
migx_bench_bert-large-uncased_1_128 PASS 13.602430612827915
migx_bench_bert-large-uncased_1_256 PASS 14.240951777125396
migx_bench_bert-large-uncased_1_384 PASS 19.958815195908148
migx_bench_bert-large-uncased_2_128 PASS 13.128591842236526
migx_bench_bert-large-uncased_2_256 PASS 13.671312931608528
migx_bench_bert-large-uncased_2_384 PASS 21.517712740472167
migx_bench_bert-large-uncased_32_128 PASS 62.9078254498767
migx_bench_bert-large-uncased_32_256 PASS 101.5021381234484
migx_bench_bert-large-uncased_32_384 Numerics 143.94597491870323
migx_bench_bert-large-uncased_4_128 PASS 14.44128212411286
migx_bench_bert-large-uncased_4_256 PASS 17.125056890238607
migx_bench_bert-large-uncased_4_384 PASS 26.636395024326745
migx_bench_bert-large-uncased_8_128 PASS 18.925565496288442
migx_bench_bert-large-uncased_8_256 PASS 27.419584516722423
migx_bench_bert-large-uncased_8_384 PASS 41.23994989284113