tiny yolo and yolo not performing well on OAK-D stereocamera

luxonis / depthai-ml-training

Some Example Neural Models that we've trained along with the training scripts

MIT License

119 stars 33 forks source link

tiny yolo and yolo not performing well on OAK-D stereocamera #57

Closed saivojjala closed 1 year ago

saivojjala commented 1 year ago

I have custom trained a yolov7 and a yolov4-tiny model for detecting objects. I have made sure to follow all the recommended steps given in the DepthAI documentation regarding conversion of the model to blob format. Despite training well, my model makes very few detections when running on the stereocamera (most are wrong). However the model works fine when used with a simple webcam on the same live feed. This led me to believe that the problem must be with the conversion. I converted the yolov4-tiny model from .weights to .onnx using tensorrt_demos library, and later converted onnx to a 6 shave blob using https://blobconverter.luxonis.com/. For yolov7, I directly converted the weights from .pt to a 6 shave blob using https://tools.luxonis.com/. Has anyone else faced similar issues when training custom models? It would be great if anyone had any sort of insight.

tersekmatija commented 1 year ago

Hey @saivojjala ,

For v4, please check out this notebook. The export process must match so that we can correctly parse it on the device. Similarly, you need to properly configure anchors and masks since v4 (and v7) are both using anchors.

For v7, using tools.luxonis.com should be the correct way. Can you try running your model using this script, where -m is path to the .blob and -c path to the .json that you get in the zip file from tools.luxonis.com. If that works, then the exported model is correct and it likely has to do with some settings in your script. Then we can debug further.

saivojjala commented 1 year ago

hey @tersekmatija I tried the script you sent for yolov7, it works now, the fps is quite low though, do you have any suggestions on how I could get higher fps? Also for yolov4, the export process I had done seems to be different from what you sent, and I am unable to use the one you sent due to some version mismatches with some libraries. So let's keep the discussion on that open. I will get back to you in a few days.

I also wanted to ask the script you sent for yolov7, loads the model onto the openvino chip right? Is it possible to run the model on the host? I have made some additions to the script you sent so that it shows the spatial coordinates of the detected objects too, would it still be possible to do that if the model wasn't using the openvino chip on the camera?

tersekmatija commented 1 year ago

I tried the script you sent for yolov7, it works now, the fps is quite low though, do you have any suggestions on how I could get higher fps?

@saivojjala It will mainly depend on the input shape and the size of the model. I'd recommend yolov7-tiny. For input shape you can try 512 x 288. It doesn't squish the image when running the inference (it doesn't during training as well AFAIK) and you can run the model on full FOV.

Yes, it loads it on the chip. Just curious, what is the reason you'd want to run it off the chip? But yes, you could run it offline. There are several options. You could just receive the color image from the camera and run the inference with a similar script as in YoloV7 repository. To get the depth, you could use the StereoNode or SpatialLocationCalculator for coordinates directly. Alternatively, you could also use this part of the code to do it on host based on the depth from the camera. I can't make a good recommendation unless I know the full use case though.

saivojjala commented 1 year ago

@tersekmatija I asked about running it on the host, as I thought not running the model on the chip could also increase the frame rate, and I could do it without using a tiny model. Thank you so much for your help. I will try out everything and reach out again if I have any other questions.

tersekmatija commented 1 year ago

It could increase it, but it depends whether you would be running it on CPU/GPU and the capabilities of the HW. Note that in that case you will have to post-process the data on your own (model output -> bounding boxes + NMS).