Guide to export custom yolo models

hardikdava commented 1 year ago

Hello,

is there any guide on how to convert custom yolo version (other than yolov5, yolov6, yolov7, yolov8)?
which layers should be modified in order to do on-device-decoding?

tersekmatija commented 1 year ago

@hardikdava Hey, it might not be fully possible to do complete on-device decoding. I'll describe below how it works so you get a better understanding. To get from output predictions to actual bounding boxes, you could say you need to perform two main steps. The first step is decoding and the second step is NMS.

Decoding depends on each Yolo version (or sometimes even release...), and some versions use the same decoding approach. You can find how this is done by looking into heads of each Yolo version (example here). In tools, we prune the head, add the sigmoid activation (due to legacy reasons as YoloV4 and V5 used it as well), and then decode it based on the version we read from the layer name. In YoloV5 example we do exactly the same decoding as the head does, just it is instead done in the FW and we don't use sigmoid since it's already added to the model. After we get out the bounding boxes, then we pass this to NMS (which is the same for all Yolos) and get a final list of bounding boxes.

If you are interested in doing decoding of a specific Yolo version on device, you can compare if the bounding box decoding matches any of the supported version.

If yes, then export the model so that you prune the head and rename the output layers to match the names of the supported version. On-device decoding should work without a problem. If you have some issues, you can share the model and what you've tried and we could likely help.
If no, then feel free to open a request/issue with the Yolo you'd want to have supported. We aim for releases that are somewhat standard, common, and have advantage over other versions (such as better throughput on edge devices, significantly better detection performance, open and permissive license, ...).

hardikdava commented 1 year ago

@tersekmatija , thanks for your reply.

I want to run damo-yolo detection model. The onnx model has 2 output nodes i.e. boxes (in the form of xyxy) and scores (shape = (number of classes, 1)). Since the boxes are already in the form of xyxy, I think I may not require decoding. But I have to find best class and then pass it to nms. The numpy operations are as follows.

output = self.model.run(None, {self.model.get_inputs()[0].name: net_image})   ## onnx prediction
scores = output[0][0]
bboxes = output[1][0]     

confidences = np.max(scores, axis=1)
valid_mask = confidences > conf_thresh
boxes = bboxes[valid_mask]
scores = scores[valid_mask]
class_ids = np.argmax(scores, axis=1)
confidences = confidences[valid_mask]

valid_boxes = non_maximum_suppression(boxes, confidences, iou_thresh)

tersekmatija commented 1 year ago

Yeah, that's already wrapped by their API. You can look here. I'd have to look deeper and investigate to see if it matches any of the current versions but will not have the time to do so anytime soon. It also seems the accuracy is not that much better and latency is measured on T4 which can deviate from OAK-D.

We could perhaps look into exposing the NMS node? CC: @themarpe

hardikdava commented 1 year ago

@tersekmatija actually, if you expose NMS is a good idea. That will bring easy to extend custom models fully compatible to run inside device itself.

luxonis / tools

Guide to export custom yolo models #51