pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch
https://pytorch.org/executorch/
Other
1.72k stars 293 forks source link

vit model fails to export with coreml delegates #5157

Open guangy10 opened 2 weeks ago

guangy10 commented 2 weeks ago

🐛 Describe the bug

Vit is claimed supported by coreml delegates here: https://github.com/pytorch/executorch/tree/main/examples/apple/coreml#frequently-encountered-errors-and-resolution, however, it will fail on export. python3 -m examples.apple.coreml.scripts.export --model_name vit

Converting PyTorch Frontend ==> MIL Ops:   5%|█████▌                                                                                                          | 43/864 [00:00<00:00, 3643.17 ops/s]
Traceback (most recent call last):
  File "/Users/guangyang/miniconda3/envs/executorch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/guangyang/miniconda3/envs/executorch/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/guangyang/executorch/examples/apple/coreml/scripts/export.py", line 180, in <module>
    lowered_module, edge_copy = lower_module_to_coreml(
  File "/Users/guangyang/executorch/examples/apple/coreml/scripts/export.py", line 94, in lower_module_to_coreml
    lowered_module = to_backend(
  File "/Users/guangyang/miniconda3/envs/executorch/lib/python3.10/functools.py", line 889, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/Users/guangyang/miniconda3/envs/executorch/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 113, in _
    preprocess_result: PreprocessResult = cls.preprocess(
  File "/Users/guangyang/miniconda3/envs/executorch/lib/python3.10/site-packages/executorch/backends/apple/coreml/compiler/coreml_preprocess.py", line 384, in preprocess
    mlmodel = ct.convert(
  File "/Users/guangyang/miniconda3/envs/executorch/lib/python3.10/site-packages/coremltools/converters/_converters_entry.py", line 635, in convert
    mlmodel = mil_convert(
  File "/Users/guangyang/miniconda3/envs/executorch/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 188, in mil_convert
    return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
  File "/Users/guangyang/miniconda3/envs/executorch/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 212, in _mil_convert
    proto, mil_program = mil_convert_to_proto(
  File "/Users/guangyang/miniconda3/envs/executorch/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 288, in mil_convert_to_proto
    prog = frontend_converter(model, **kwargs)
  File "/Users/guangyang/miniconda3/envs/executorch/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 108, in __call__
    return load(*args, **kwargs)
  File "/Users/guangyang/miniconda3/envs/executorch/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 84, in load
    return _perform_torch_convert(converter, debug)
  File "/Users/guangyang/miniconda3/envs/executorch/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 126, in _perform_torch_convert
    raise e
  File "/Users/guangyang/miniconda3/envs/executorch/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 118, in _perform_torch_convert
    prog = converter.convert()
  File "/Users/guangyang/miniconda3/envs/executorch/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 1184, in convert
    convert_nodes(self.context, self.graph, early_exit=not has_states)
  File "/Users/guangyang/miniconda3/envs/executorch/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 93, in convert_nodes
    raise e     # re-raise exception
  File "/Users/guangyang/miniconda3/envs/executorch/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 88, in convert_nodes
    convert_single_node(context, node)
  File "/Users/guangyang/miniconda3/envs/executorch/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 119, in convert_single_node
    raise RuntimeError(
RuntimeError: PyTorch convert function for op 'any.dim' not implemented.

Versions

PyTorch version: 2.5.0.dev20240829 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: macOS 14.6.1 (arm64) GCC version: Could not collect Clang version: 15.0.0 (clang-1500.3.9.4) CMake version: version 3.29.0 Libc version: N/A

Python version: 3.10.13 (main, Sep 11 2023, 08:16:02) [Clang 14.0.6 ] (64-bit runtime) Python platform: macOS-14.6.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Apple M1 Max

Versions of relevant libraries: [pip3] executorch==0.4.0a0+52c9f30 [pip3] executorchcoreml==0.0.1 [pip3] flake8==6.0.0 [pip3] flake8-breakpoint==1.1.0 [pip3] flake8-bugbear==23.6.5 [pip3] flake8-comprehensions==3.12.0 [pip3] flake8-plugin-utils==1.3.3 [pip3] flake8-pyi==23.5.0 [pip3] mypy-extensions==1.0.0 [pip3] numpy==1.21.3 [pip3] pytorch-labs-segment-anything-fast==0.2 [pip3] torch==2.5.0.dev20240829 [pip3] torchaudio==2.5.0.dev20240829 [pip3] torchsr==1.0.4 [pip3] torchvision==0.20.0.dev20240829 [conda] executorch 0.4.0a0+52c9f30 pypi_0 pypi [conda] executorchcoreml 0.0.1 pypi_0 pypi [conda] numpy 1.21.3 pypi_0 pypi [conda] pytorch-labs-segment-anything-fast 0.2 pypi_0 pypi [conda] torch 2.5.0.dev20240829 pypi_0 pypi [conda] torchaudio 2.5.0.dev20240829 pypi_0 pypi [conda] torchfix 0.1.1 pypi_0 pypi [conda] torchsr 1.0.4 pypi_0 pypi [conda] torchvision 0.20.0.dev20240829 pypi_0 pypi

YifanShenSZ commented 3 days ago

Hi @guangy10, once the update to coremltools 8.0 PR land, could you please try again and see if the issue persists?

Also, 2 more comments:

  1. The break must comes from model definition change. We don't remove existing op support, so that must be a newly introduced op
  2. The export script has --use_partitioner arg, could that work for you?