tinyvision / DAMO-YOLO

DAMO-YOLO: a fast and accurate object detection method with some new techs, including NAS backbones, efficient RepGFPN, ZeroHead, AlignedOTA, and distillation enhancement.
Apache License 2.0
3.79k stars 476 forks source link

Damo Yolo pth to coreml #139

Open adkbbx opened 6 months ago

adkbbx commented 6 months ago

Before Asking

Search before asking

Question

I am trying to convert my custom object detection model built on Damo-yolo with 4 classes into Coreml to run on my swift application and wanted to know what my output means.

image

Shape of Both "var_1262" and "var_1298" are of ([1,8400,4]) and ([1,8400,4]) respectively. I wanted to know what these ouputs meant and how I can add an NMS layer to this current model so as to make prediction on my IOS device.

I referred the onnx conversion code to convert my .pth model into coreml using coremltools and the code is as follows.

import torch
import coremltools as ct
from torch import nn
from loguru import logger
from damo.base_models.core.end2end import End2End
from damo.base_models.core.ops import RepConv, SiLU
from damo.config.base import parse_config
from damo.detectors.detector import build_local_model
from damo.utils.model_utils import get_model_info, replace_module

device = torch.device('cpu')
config_file = "./configs/damoyoloT.py"
config = parse_config(config_file)

# build model
model = build_local_model(config, device)
model.eval()

ckpt_file = "./latest_ckpt.pth"
# load model paramerters
ckpt = torch.load(ckpt_file, map_location=device)

if 'model' in ckpt:
    ckpt = ckpt['model']
model.load_state_dict(ckpt, strict=True)
logger.info(f'loading checkpoint from {ckpt_file}.')

model = replace_module(model, nn.SiLU, SiLU)

for layer in model.modules():
    if isinstance(layer, RepConv):
        layer.switch_to_deploy()

info = get_model_info(model, (640, 640))
logger.info(info)
model.head.nms = False

inputs = torch.randn(1,3,640,640)
traced_model = torch.jit.trace(model,inputs)

input_image = ct.ImageType(name='inputs', shape=(1,3,640,640),scale=1/255,
              bias=[0,0,0])
coreml_model = ct.convert(traced_model, inputs=[input_image])
coreml_model.save('./latest_ckpt_test.mlmodel')

Thanks for the answer.

Additional

No response