stereolabs / zed-csharp-api

C# API for the ZED SDK
MIT License
10 stars 3 forks source link

[ONNX] Accurate my own detection #15

Closed LudoTexx closed 3 years ago

LudoTexx commented 3 years ago

Hello,

I already use 4 ZED cam for 3D SbS stream, for peoples and vehicles detection.

Now i try to use my own ONNX ML detection with personal objects (old explosifs), I have train a model with Custom Vision, for some reasons i'm not allowed to show what kind of objects we use and can't use those at home for training, for safety reasons of course, i waiting a replica actually, so i use a toy of my son for home personal code learning.

Same size approximately, but really more complicated in his shape than the explosifs (no symmetry on the toy)

I have train a model with 250 pictures, different sizes, backgrounds, light. In 3 training iterations.

That 'works', but not really good and i'm not happy about the accuracy.

Just for check i have train a banana model with 200 pictures in 2 trainings iterations. (Always deep training, not fast)

I have compare my results with the ZED banana detection. ZED is faster and more accurate of course :).

But if you compare ZED accuracy between apple (or banana, etc..) and peoples detection they are a huge difference, ZED detect peoples much better than any other objects. Close, far, fast, lot of light, darkness ZED do a really good job with human.

My question is, is this because your more focused on human body ? Did the human body have a shape easier to detect?

I know you probably do not want say too much about how your detection work but did you use a network neural system, or depth layer, or maybe both combinate?

If you have any advice about how i can accurate my detection this will help me.

Thanks.

adujardin commented 3 years ago

People are a primary focus for the object detection in the ZED SDK yes, the precision, in that case, partly comes from the dataset. There are more humans than other classes of objects and we focus our work on this accuracy.

If by 3 iterations you mean 3 epoch (3 times the entire dataset fed into the network), it may not be enough I think. For datasets such as COCO (~120k images), it's common to have tens or even hundreds of epochs of training, taking a few days to few weeks on a big dedicated GPU.

Training a custom model can be hard but there are projects that are really well documented and efficient. You can check out this one https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data, it may help you get the accuracy you want while being easy to use (+ this one can be exported to ONNX at the end).