nod-ai / SHARK-TestSuite

Temporary home of a test suite we are evaluating
Apache License 2.0
2 stars 23 forks source link

Model Level Tests Failed #226

Open alexsifivetw opened 2 months ago

alexsifivetw commented 2 months ago

Issues Discription

I tried to use the following command to run all model level tests:

python run.py -j 16 --report --cachedir cached -v --testsfile models.txt \
  --torchmlirbuild <path>/torch-mlir/build \
  --ireebuild <path>/iree/build-host

However, some tests failed at the different phases.

models.txt

pytorch/models/beit-base-patch16-224-pt22k-ft22k
pytorch/models/bert-large-uncased
pytorch/models/bge-base-en-v1.5
pytorch/models/deit-small-distilled-patch16-224
pytorch/models/dlrm
pytorch/models/gemma-7b
pytorch/models/gpt2
pytorch/models/gpt2-xl
pytorch/models/llama2-7b-GPTQ
pytorch/models/llama2-7b-hf
pytorch/models/miniLM-L12-H384-uncased
pytorch/models/mit-b0
pytorch/models/mobilebert-uncased
pytorch/models/opt-125M
pytorch/models/opt-125m-gptq
pytorch/models/opt-1.3b
pytorch/models/opt-350m
pytorch/models/phi-1_5
pytorch/models/phi-2
pytorch/models/resnet50
pytorch/models/stablelm-3b-4e1t
pytorch/models/t5-base
pytorch/models/t5-large
pytorch/models/vicuna-13b-v1.3
pytorch/models/vit-base-patch16-224
pytorch/models/whisper-base
pytorch/models/whisper-medium
pytorch/models/whisper-small

statusreport.md

Status report for run: test-run using mode:onnx todtype:default backend:llvm-cpu

| tests                                            | model-run   | onnx-import   | torch-mlir   | iree-compile   | inference   |
|:-------------------------------------------------|:------------|:--------------|:-------------|:---------------|:------------|
| pytorch/models/beit-base-patch16-224-pt22k-ft22k | passed      | passed        | passed       | failed         | notrun      |
| pytorch/models/bert-large-uncased                | passed      | passed        | passed       | passed         | passed      |
| pytorch/models/bge-base-en-v1.5                  | passed      | passed        | passed       | passed         | passed      |
| pytorch/models/deit-small-distilled-patch16-224  | passed      | passed        | passed       | failed         | notrun      |
| pytorch/models/dlrm                              | passed      | failed        | notrun       | notrun         | notrun      |
| pytorch/models/gemma-7b                          | failed      | notrun        | notrun       | notrun         | notrun      |
| pytorch/models/gpt2                              | passed      | passed        | passed       | passed         | passed      |
| pytorch/models/gpt2-xl                           | passed      | failed        | notrun       | notrun         | notrun      |
| pytorch/models/llama2-7b-GPTQ                    | failed      | notrun        | notrun       | notrun         | notrun      |
| pytorch/models/llama2-7b-hf                      | passed      | failed        | notrun       | notrun         | notrun      |
| pytorch/models/miniLM-L12-H384-uncased           | passed      | passed        | passed       | passed         | passed      |
| pytorch/models/mit-b0                            | passed      | passed        | passed       | passed         | mismatch    |
| pytorch/models/mobilebert-uncased                | passed      | passed        | passed       | passed         | passed      |
| pytorch/models/opt-125M                          | passed      | passed        | passed       | failed         | notrun      |
| pytorch/models/opt-125m-gptq                     | failed      | notrun        | notrun       | notrun         | notrun      |
| pytorch/models/opt-1.3b                          | passed      | failed        | notrun       | notrun         | notrun      |
| pytorch/models/opt-350m                          | passed      | passed        | passed       | failed         | notrun      |
| pytorch/models/phi-1_5                           | passed      | failed        | notrun       | notrun         | notrun      |
| pytorch/models/phi-2                             | passed      | failed        | notrun       | notrun         | notrun      |
| pytorch/models/resnet50                          | passed      | passed        | passed       | passed         | passed      |
| pytorch/models/stablelm-3b-4e1t                  | passed      | failed        | notrun       | notrun         | notrun      |
| pytorch/models/t5-base                           | passed      | passed        | passed       | passed         | passed      |
| pytorch/models/t5-large                          | passed      | failed        | notrun       | notrun         | notrun      |
| pytorch/models/vicuna-13b-v1.3                   | passed      | failed        | notrun       | notrun         | notrun      |
| pytorch/models/vit-base-patch16-224              | passed      | passed        | passed       | failed         | notrun      |
| pytorch/models/whisper-base                      | passed      | passed        | passed       | failed         | notrun      |
| pytorch/models/whisper-medium                    | passed      | passed        | passed       | failed         | notrun      |
| pytorch/models/whisper-small                     | passed      | passed        | passed       | failed         | notrun      |

timereport.md

Time (in seconds) report for run: test-run using mode:onnx todtype:default backend:llvm-cpu

| tests                                            |   model-run |   onnx-import |   torch-mlir |   iree-compile |   inference |
|:-------------------------------------------------|------------:|--------------:|-------------:|---------------:|------------:|
| pytorch/models/beit-base-patch16-224-pt22k-ft22k |      20.085 |         5.604 |        2.602 |          1.906 |       0     |
| pytorch/models/bert-large-uncased                |      32.347 |        28.219 |       14.083 |         28.793 |       3.373 |
| pytorch/models/bge-base-en-v1.5                  |      24.303 |         4.25  |        3.225 |         15.794 |       0.334 |
| pytorch/models/deit-small-distilled-patch16-224  |       6.924 |         2.298 |        0.988 |          1.15  |       0     |
| pytorch/models/dlrm                              |      22.613 |         4.257 |        0     |          0     |       0     |
| pytorch/models/gemma-7b                          |       1.613 |         0     |        0     |          0     |       0     |
| pytorch/models/gpt2                              |      22.688 |         5.388 |        6.158 |         13.244 |       0.362 |
| pytorch/models/gpt2-xl                           |     131.144 |        27.393 |        0     |          0     |       0     |
| pytorch/models/llama2-7b-GPTQ                    |    1709.58  |         0     |        0     |          0     |       0     |
| pytorch/models/llama2-7b-hf                      |     210.509 |        77.156 |        0     |          0     |       0     |
| pytorch/models/miniLM-L12-H384-uncased           |      60.598 |         3.903 |        1.805 |         11.064 |       0.212 |
| pytorch/models/mit-b0                            |       5.435 |         0.461 |        0.122 |          6.529 |       0.389 |
| pytorch/models/mobilebert-uncased                |      14.15  |         1.863 |        0.718 |         18.884 |       0.231 |
| pytorch/models/opt-125M                          |     105.866 |        12.384 |        6.188 |          3.396 |       0     |
| pytorch/models/opt-125m-gptq                     |      16.984 |         0     |        0     |          0     |       0     |
| pytorch/models/opt-1.3b                          |     148.094 |        25.934 |        0     |          0     |       0     |
| pytorch/models/opt-350m                          |      75.516 |        29.6   |       14.024 |          7.107 |       0     |
| pytorch/models/phi-1_5                           |      68.944 |        25.611 |        0     |          0     |       0     |
| pytorch/models/phi-2                             |     131.54  |        77.58  |        0     |          0     |       0     |
| pytorch/models/resnet50                          |      52.362 |         4.101 |        1.7   |         15.239 |       1.508 |
| pytorch/models/stablelm-3b-4e1t                  |     378.293 |        66.011 |        0     |          0     |       0     |
| pytorch/models/t5-base                           |      48.349 |        30.027 |       18.562 |         46.368 |       8.262 |
| pytorch/models/t5-large                          |     270.786 |        12.808 |        0     |          0     |       0     |
| pytorch/models/vicuna-13b-v1.3                   |    1066.81  |       180.001 |        0     |          0     |       0     |
| pytorch/models/vit-base-patch16-224              |      95.638 |         7.033 |        3.691 |          2.666 |       0     |
| pytorch/models/whisper-base                      |       9.993 |         6.661 |        3.036 |          1.528 |       0     |
| pytorch/models/whisper-medium                    |      45.323 |        37.692 |       19.659 |         11.532 |       0     |
| pytorch/models/whisper-small                     |     275.743 |        17.317 |        5.912 |          3.028 |       0     |

summaryreport.md

Summary (time in seconds) for run: test-run using mode:onnx todtype:default backend:llvm-cpu

| items        |    tests |   model-run |   onnx-import |   torch-mlir |   iree-compile |   inference |
|:-------------|---------:|------------:|--------------:|-------------:|---------------:|------------:|
| total-count  |  28      |      25     |        16     |       16     |          8     |       7     |
| average-time | 216.113  |     180.437 |        24.77  |        3.66  |          6.722 |       0.524 |
| median-time  |  68.7585 |      56.48  |         9.709 |        0.853 |          1.717 |       0     |

Expected Behavior

All tests pass.(?)

Environment

Other Information

ScottTodd commented 1 month ago

Expected Behavior

All tests pass.(?)

This repo has the current results observed on CI: https://github.com/nod-ai/e2eshark-reports