Closed Typiqally closed 2 years ago
Hi @Typiqally, sorry for late reply. I think the reason is default value of compute_precision when convert.
model = ct.convert(source_model,
convert_to="mlprogram",
compute_precision=ct.precision.FLOAT32)
The default value of compute_precision is ct.precision.FLOAT16
with coremltools 5.0b3 and higher, while in mmdeploy the value is set to ct.precision.FLOAT32
Hi there, that is a logical explanation. However, I'm not quite sure how this relates to ANE compilation errors. For YOLO, the compute precision seems to fix the ANE targeting issue, but for Mask-RCNN I haven't been so lucky. Give me some time to run a few tests.
Hi there, excuse me for taking so long, I've been having massive troubles setting up an environment to run MMDeploy on my Mac. Although I have made it reproducible now using a Conda environment.yml, which I can deliver to this repo if desired. However, back to the ANE. I have confirmed that FP16 mode seems to run the model on ANE, but only in some cases. For example, I get the FP16 model to run on ANE on my MacBook, but not on my iPad (maybe an operator compatibility issue?) Also, the prediction time is faster for GPU than for ANE, which might also be the reason that it decides to switch. Since Apple does not provide any info on this, I cannot be certain why it is not selecting ANE.
Currently, I have to change the flag manually at: https://github.com/open-mmlab/mmdeploy/blob/c4d428fd7d7b34b70d35e775aaff4c27ee7c317b/mmdeploy/backend/coreml/torchscript2coreml.py#L57
Is there any way I can configure this without having to go into the codebase?
Hi, the current code doesn't pass the fp16_mode argument to the convert function. And currently there is no fp16_mode configure options for coreml config compared to tensorrt
If it is convenient for you, coud you please make a PR to fix this?
Sure, give me some time
Checklist
Describe the bug
MMDetection object detection models such as yolov3_d53_mstrain-608_273e_coco, faster_rcnn_regnetx-3.2GF_fpn_1x_coco, and instance segmentation model mask_rcnn_regnetx-12GF_fpn_1x_coco lose the ability to run on the Apple Neural Engine (ANE), and instead are running on GPU.
For YOLO and Faster R-CNN, I have managed to develop a suboptimal workaround, which keeps the TorchScript model in memory after JIT tracing, instead of writing it to a file before sending it to the conversion task from coremltools. This can simply be reproduced by temporarily changing the following:
When converting the model targeting the Core ML backend with this patch, the model is run against the ANE correctly. It is obvious that this is not an ideal solution, and I am not completely sure why this works and the existing code does not work. One thing have noticed is that the model size is reduced by a factor 2, which might be due to some kind of quantization, which I am not aware of.
This is the same model, with one version converted using the default Core ML backend, and the other version converted using the Core ML backend with the aforementioned patch.
Notice the processing unit allocation and model size.
Reproduction
python mmdeploy/tools/deploy.py \ mmdeploy/configs/mmdet/detection/detection_coreml_static-608x608.py \ mmdetection/configs/yolo/yolov3_d53_mstrain-608_273e_coco.py \ checkpoints/yolov3_d53_mstrain-608_273e_coco_20210518_115020-a2c3acb8.pth \ mmdetection/demo/demo.jpg \ --work-dir work_dir/yolov3 \ --device cpu
Environment
Error traceback