[Bug] Core ML: MMDetection models not compiling for ANE

Typiqally commented 2 years ago

Checklist

[X] I have searched related issues but cannot get the expected help.
[X] 2. I have read the FAQ documentation but cannot get the expected help.
[X] 3. The bug has not been fixed in the latest version.

Describe the bug

MMDetection object detection models such as yolov3_d53_mstrain-608_273e_coco, faster_rcnn_regnetx-3.2GF_fpn_1x_coco, and instance segmentation model mask_rcnn_regnetx-12GF_fpn_1x_coco lose the ability to run on the Apple Neural Engine (ANE), and instead are running on GPU.

For YOLO and Faster R-CNN, I have managed to develop a suboptimal workaround, which keeps the TorchScript model in memory after JIT tracing, instead of writing it to a file before sending it to the conversion task from coremltools. This can simply be reproduced by temporarily changing the following:

diff --git a/mmdeploy/apis/torch_jit/trace.py b/mmdeploy/apis/torch_jit/trace.py
index 7da66f5e..5c344a37 100644
--- a/mmdeploy/apis/torch_jit/trace.py
+++ b/mmdeploy/apis/torch_jit/trace.py
@@ -3,6 +3,8 @@ from copy import deepcopy
 from typing import Any, Dict, Optional, Sequence, Tuple, Union

 import torch
+import coremltools as ct
+import numpy as np

 from mmdeploy.core import RewriterContext, patch_model
 from mmdeploy.utils import IR, Backend, get_ir_config, get_root_logger
@@ -99,6 +101,30 @@ def trace(func: torch.nn.Module,
             check_trace=check_trace,
             check_tolerance=check_tolerance)

+    ts_model.eval()  # unsure if this is required
+    input_shape = (1, 3, 608, 608)
+    example_input = torch.rand(input_shape)
+    out = ts_model(example_input)  # unsure if this is required
+
+    shape = ct.Shape(shape=input_shape, default=input_shape)
+
+    model = ct.convert(
+        ts_model,
+        inputs=[
+            ct.ImageType(name="image", shape=shape, scale=1 / 255.0, bias=[0, 0, 0], color_layout=ct.colorlayout.BGR)
+        ],
+        outputs=[
+            ct.TensorType("detections", dtype=np.float32),
+            ct.TensorType("labels", dtype=np.int32),
+        ],
+        convert_to="mlprogram",
+        minimum_deployment_target=ct.target.iOS15,
+    )
+
+    model.save(output_path_prefix + '.mlpackage')
+    exit()
+
     # save model

     if output_path_prefix is not None:
         output_path = output_path_prefix + '.pt'

When converting the model targeting the Core ML backend with this patch, the model is run against the ANE correctly. It is obvious that this is not an ideal solution, and I am not completely sure why this works and the existing code does not work. One thing have noticed is that the model size is reduced by a factor 2, which might be due to some kind of quantization, which I am not aware of.

This is the same model, with one version converted using the default Core ML backend, and the other version converted using the Core ML backend with the aforementioned patch.

Notice the processing unit allocation and model size.

Reproduction

python mmdeploy/tools/deploy.py \ mmdeploy/configs/mmdet/detection/detection_coreml_static-608x608.py \ mmdetection/configs/yolo/yolov3_d53_mstrain-608_273e_coco.py \ checkpoints/yolov3_d53_mstrain-608_273e_coco_20210518_115020-a2c3acb8.pth \ mmdetection/demo/demo.jpg \ --work-dir work_dir/yolov3 \ --device cpu

Environment

2022-10-24 10:35:59,343 - mmdeploy - INFO - **********Environmental information**********
2022-10-24 10:36:02,621 - mmdeploy - INFO - sys.platform: darwin
2022-10-24 10:36:02,622 - mmdeploy - INFO - Python: 3.8.13 (default, Mar 28 2022, 06:13:39) [Clang 12.0.0 ]
2022-10-24 10:36:02,622 - mmdeploy - INFO - CUDA available: False
2022-10-24 10:36:02,622 - mmdeploy - INFO - GCC: Apple clang version 14.0.0 (clang-1400.0.29.102)
2022-10-24 10:36:02,622 - mmdeploy - INFO - PyTorch: 1.9.0.post2
2022-10-24 10:36:02,622 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 4.2
  - C++ Version: 201402
  - clang 11.1.0
  - OpenMP 201811
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CXX_COMPILER=/Users/runner/miniforge3/conda-bld/pytorch-recipe_1629200524980/_build_env/bin/arm64-apple-darwin20.0.0-clang++, CXX_FLAGS=-ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -stdlib=libc++  -std=c++14 -fmessage-length=0 -isystem /Users/runner/miniforge3/conda-bld/pytorch-recipe_1629200524980/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/pytorch-recipe_1629200524980/work=/usr/local/src/conda/pytorch-1.9.0 -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/pytorch-recipe_1629200524980/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac=/usr/local/src/conda-prefix -Wno-deprecated-declarations -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp=libomp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-typedef-redefinition -Wno-unknown-warning-option -Wno-unused-private-field -Wno-inconsistent-missing-override -Wno-aligned-allocation-unavailable -Wno-c++14-extensions -Wno-constexpr-not-const -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-unused-private-field -Wno-missing-braces -Wno-c++14-extensions -Wno-constexpr-not-const, LAPACK_INFO=open, TORCH_VERSION=1.9.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, 

2022-10-24 10:36:02,622 - mmdeploy - INFO - TorchVision: 0.10.0a0
2022-10-24 10:36:02,622 - mmdeploy - INFO - OpenCV: 4.6.0
2022-10-24 10:36:02,622 - mmdeploy - INFO - MMCV: 1.6.1
2022-10-24 10:36:02,622 - mmdeploy - INFO - MMCV Compiler: clang 14.0.0
2022-10-24 10:36:02,622 - mmdeploy - INFO - MMCV CUDA Compiler: not available
2022-10-24 10:36:02,622 - mmdeploy - INFO - MMDeploy: 0.8.0+9f6d289
2022-10-24 10:36:02,622 - mmdeploy - INFO - 

2022-10-24 10:36:02,622 - mmdeploy - INFO - **********Backend information**********
WARNING:root:scikit-learn version 1.1.2 is not supported. Minimum required version: 0.17. Maximum required version: 0.19.2. Disabling scikit-learn conversion API.
2022-10-24 10:36:03,874 - mmdeploy - INFO - onnxruntime: None   ops_is_avaliable : False
2022-10-24 10:36:03,875 - mmdeploy - INFO - tensorrt: None  ops_is_avaliable : False
2022-10-24 10:36:03,889 - mmdeploy - INFO - ncnn: None  ops_is_avaliable : False
2022-10-24 10:36:03,890 - mmdeploy - INFO - pplnn_is_avaliable: False
2022-10-24 10:36:03,890 - mmdeploy - INFO - openvino_is_avaliable: False
2022-10-24 10:36:03,903 - mmdeploy - INFO - snpe_is_available: False
2022-10-24 10:36:03,903 - mmdeploy - INFO - ascend_is_available: False
2022-10-24 10:36:03,917 - mmdeploy - INFO - coreml_is_available: True
2022-10-24 10:36:03,917 - mmdeploy - INFO - 

2022-10-24 10:36:03,917 - mmdeploy - INFO - **********Codebase information**********
2022-10-24 10:36:03,921 - mmdeploy - INFO - mmdet:  2.25.1
2022-10-24 10:36:03,921 - mmdeploy - INFO - mmseg:  0.28.0
2022-10-24 10:36:03,922 - mmdeploy - INFO - mmcls:  0.23.2
2022-10-24 10:36:03,922 - mmdeploy - INFO - mmocr:  None
2022-10-24 10:36:03,922 - mmdeploy - INFO - mmedit: None
2022-10-24 10:36:03,922 - mmdeploy - INFO - mmdet3d:    None
2022-10-24 10:36:03,922 - mmdeploy - INFO - mmpose: None
2022-10-24 10:36:03,922 - mmdeploy - INFO - mmrotate:   None

Error traceback

Not applicable

irexyc commented 2 years ago

Hi @Typiqally, sorry for late reply. I think the reason is default value of compute_precision when convert.

model = ct.convert(source_model, 
                   convert_to="mlprogram", 
                   compute_precision=ct.precision.FLOAT32)

The default value of compute_precision is ct.precision.FLOAT16 with coremltools 5.0b3 and higher, while in mmdeploy the value is set to ct.precision.FLOAT32

Typiqally commented 2 years ago

Hi there, that is a logical explanation. However, I'm not quite sure how this relates to ANE compilation errors. For YOLO, the compute precision seems to fix the ANE targeting issue, but for Mask-RCNN I haven't been so lucky. Give me some time to run a few tests.

Typiqally commented 2 years ago

Hi there, excuse me for taking so long, I've been having massive troubles setting up an environment to run MMDeploy on my Mac. Although I have made it reproducible now using a Conda environment.yml, which I can deliver to this repo if desired. However, back to the ANE. I have confirmed that FP16 mode seems to run the model on ANE, but only in some cases. For example, I get the FP16 model to run on ANE on my MacBook, but not on my iPad (maybe an operator compatibility issue?) Also, the prediction time is faster for GPU than for ANE, which might also be the reason that it decides to switch. Since Apple does not provide any info on this, I cannot be certain why it is not selecting ANE.

Currently, I have to change the flag manually at: https://github.com/open-mmlab/mmdeploy/blob/c4d428fd7d7b34b70d35e775aaff4c27ee7c317b/mmdeploy/backend/coreml/torchscript2coreml.py#L57

Is there any way I can configure this without having to go into the codebase?

irexyc commented 2 years ago

Hi, the current code doesn't pass the fp16_mode argument to the convert function. And currently there is no fp16_mode configure options for coreml config compared to tensorrt

If it is convenient for you, coud you please make a PR to fix this?

Typiqally commented 2 years ago

Sure, give me some time

open-mmlab / mmdeploy