Pytorch Model Exporting Issue: MLIR Verification Failed

saienduri commented 5 months ago

With the latest torch (2.4) and iree-turbine, we are seeing this MLIR verification failure come up for a lot of our models during the export stage (aot.export).

Instructions to reproduce this error:

Follow setup instructions here including the "Turbine Mode" instructions: https://github.com/nod-ai/SHARK-TestSuite/blob/main/e2eshark/README.md.

Then, run the following command from the SHARK-TestSuite/e2eshark directory (example to only run bert model). Change --tests flag based on the model you want to test:

HF_TOKEN=<your_hf_token> python3.11 ./run.py \
          -r ./test-turbine \
          --report \
          --cachedir ~/huggingface_cache \
          --mode turbine \
          -g models \
          --postprocess \
          -v \
          --tests pytorch/models/bert-large-uncased

You can find the debug artifacts in SHARK-TestSuite/e2eshark/test-turbine/pytorch/models/<model_name> Here you can find the model-run.log file for example which will describe the error in more detail. You can also find the mlir generated for the model that failed verification in /tmp/turbine_module_builder_error.mlir Models: pytorch/models/vicuna-13b-v1.3 pytorch/models/llama2-7b-GPTQ pytorch/models/mobilebert-uncased pytorch/models/miniLM-L12-H384-uncased pytorch/models/bert-large-uncased pytorch/models/gpt2-xl pytorch/models/phi-2 pytorch/models/phi-1_5 pytorch/models/bge-base-en-v1.5 pytorch/models/llama2-7b-hf pytorch/models/gpt2

Traceback (most recent call last):
  File "/home/nod/sai/iree-turbine/shark_turbine/aot/support/ir_utils.py", line 215, in finalize_construct
    self.module_op.verify()
iree.compiler._mlir_libs._site_initialize.<locals>.MLIRError: Verification failed:
error: "/home/nod/sai/SHARK-TestSuite/e2eshark/curr_venv/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py":1183:0: 'torch.aten.slice.Tensor' op operand #0 must be Any Torch tensor type, but got '!torch.none'
 note: "/home/nod/sai/SHARK-TestSuite/e2eshark/curr_venv/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py":1183:0: see current operation: %168 = "torch.aten.slice.Tensor"(%163, %164, %165, %166, %167) : (!torch.none, !torch.int, !torch.int, !torch.int, !torch.int) -> !torch.vtensor<[8,128],f32>

Traceback (most recent call last):
  File "/home/nod/sai/SHARK-TestSuite/e2eshark/test-turbine/pytorch/models/vicuna-13b-v1.3/runmodel.py", line 131, in <module>
    module = aot.export(model, E2ESHARK_CHECK["input"])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nod/sai/iree-turbine/shark_turbine/aot/exporter.py", line 304, in export
    cm = TransformedModule(context=context, import_to="import")
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nod/sai/iree-turbine/shark_turbine/aot/compiled_module.py", line 654, in __new__
    module_builder.finalize_construct()
  File "/home/nod/sai/iree-turbine/shark_turbine/aot/support/ir_utils.py", line 215, in finalize_construct
    self.module_op.verify()
iree.compiler._mlir_libs._site_initialize.<locals>.MLIRError: Verification failed:
error: "/home/nod/sai/SHARK-TestSuite/e2eshark/curr_venv/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py":1183:0: 'torch.aten.slice.Tensor' op operand #0 must be Any Torch tensor type, but got '!torch.none'
 note: "/home/nod/sai/SHARK-TestSuite/e2eshark/curr_venv/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py":1183:0: see current operation: %168 = "torch.aten.slice.Tensor"(%163, %164, %165, %166, %167) : (!torch.none, !torch.int, !torch.int, !torch.int, !torch.int) -> !torch.vtensor<[8,128],f32>

Models: pytorch/models/beit-base-patch16-224-pt22k-ft22k

Traceback (most recent call last):
  File "/home/nod/sai/SHARK-TestSuite/e2eshark/test-turbine/pytorch/models/beit-base-patch16-224-pt22k-ft22k/runmodel.py", line 110, in <module>
    module = aot.export(model, E2ESHARK_CHECK["input"])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nod/sai/iree-turbine/shark_turbine/aot/exporter.py", line 304, in export
    cm = TransformedModule(context=context, import_to="import")
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nod/sai/iree-turbine/shark_turbine/aot/compiled_module.py", line 654, in __new__
    module_builder.finalize_construct()
  File "/home/nod/sai/iree-turbine/shark_turbine/aot/support/ir_utils.py", line 215, in finalize_construct
    self.module_op.verify()
iree.compiler._mlir_libs._site_initialize.<locals>.MLIRError: Verification failed:
error: "/home/nod/sai/SHARK-TestSuite/e2eshark/curr_venv/lib/python3.11/site-packages/transformers/models/beit/modeling_beit.py":875:0: 'torch.aten.view' op operand #0 must be Any Torch tensor type, but got '!torch.none'
 note: "/home/nod/sai/SHARK-TestSuite/e2eshark/curr_venv/lib/python3.11/site-packages/transformers/models/beit/modeling_beit.py":875:0: see current operation: %189 = "torch.aten.view"(%186, %188) : (!torch.none, !torch.list<int>) -> !torch.vtensor<[38809],si64>

widiba03304 commented 1 month ago

Is there any update on this? I also need help with this error.

chrsmcgrr commented 1 week ago

I encountered the same issue on other models, had a fix locally, but now it was fixed upstreamed in torch-mlir. This was actually an FxImporter issue. Could you check now with latest turbine?

Here's a link to that fix

iree-org / iree-turbine

Pytorch Model Exporting Issue: MLIR Verification Failed #24