luxonis / tools

Various tools for OAK-D camera
GNU Affero General Public License v3.0
28 stars 9 forks source link

Guide to export custom yolo models #51

Open hardikdava opened 1 year ago

hardikdava commented 1 year ago

Hello,

tersekmatija commented 1 year ago

@hardikdava Hey, it might not be fully possible to do complete on-device decoding. I'll describe below how it works so you get a better understanding. To get from output predictions to actual bounding boxes, you could say you need to perform two main steps. The first step is decoding and the second step is NMS.

Decoding depends on each Yolo version (or sometimes even release...), and some versions use the same decoding approach. You can find how this is done by looking into heads of each Yolo version (example here). In tools, we prune the head, add the sigmoid activation (due to legacy reasons as YoloV4 and V5 used it as well), and then decode it based on the version we read from the layer name. In YoloV5 example we do exactly the same decoding as the head does, just it is instead done in the FW and we don't use sigmoid since it's already added to the model. After we get out the bounding boxes, then we pass this to NMS (which is the same for all Yolos) and get a final list of bounding boxes.

If you are interested in doing decoding of a specific Yolo version on device, you can compare if the bounding box decoding matches any of the supported version.

hardikdava commented 1 year ago

@tersekmatija , thanks for your reply.

I want to run damo-yolo detection model. The onnx model has 2 output nodes i.e. boxes (in the form of xyxy) and scores (shape = (number of classes, 1)). Since the boxes are already in the form of xyxy, I think I may not require decoding. But I have to find best class and then pass it to nms. The numpy operations are as follows.

output = self.model.run(None, {self.model.get_inputs()[0].name: net_image})   ## onnx prediction
scores = output[0][0]
bboxes = output[1][0]     

confidences = np.max(scores, axis=1)
valid_mask = confidences > conf_thresh
boxes = bboxes[valid_mask]
scores = scores[valid_mask]
class_ids = np.argmax(scores, axis=1)
confidences = confidences[valid_mask]

valid_boxes = non_maximum_suppression(boxes, confidences, iou_thresh)
tersekmatija commented 1 year ago

Yeah, that's already wrapped by their API. You can look here. I'd have to look deeper and investigate to see if it matches any of the current versions but will not have the time to do so anytime soon. It also seems the accuracy is not that much better and latency is measured on T4 which can deviate from OAK-D.

We could perhaps look into exposing the NMS node? CC: @themarpe

hardikdava commented 1 year ago

@tersekmatija actually, if you expose NMS is a good idea. That will bring easy to extend custom models fully compatible to run inside device itself.