luxonis / depthai-python

DepthAI Python Library
MIT License
351 stars 190 forks source link

Problems in integrating opencv_face_detector_uint8.pb on oak lite camera #496

Closed 28anmol closed 2 years ago

28anmol commented 2 years ago

Greetings!

I am trying to build a face tracker using: OAK D lite camera for inference of face detection model Raspberry pi 3 opencv_face_detector_uint8.pb(modelfile) & opencv_face_detector.pbtxt(configfile). - (tensorflow model) and ofcourse the relevant hardware, motors and relevant python libraries(opencv,gpiozero,numpy....) to realize the project.

As I am new to oak d camera(although I have already learned the basics of how to deploy models from depthai and depthai-python repository), I would like to know how to integrate this above-mentioned opencv face detection model to be deployed on oak d lite via raspberry pi 3 and would it be really possible to do that? Help, suggestions, and guidance would be extremely helpful in this case (especially on how to integrate the nn model so that its functional and deployable on oak d lite).

Secondly, is there any way(by chance) to import the oak d lite camera for just viewing video frames through OpenCV just like an external camera using: cv2.VideoCapture() and then deploy the face detection models? or one really needs to follow the same way as in depthai-python/examples/ColorCamera/rgb_preview.py code for example which basically teaches how to setup the pipeline system of oak d and view the camera frames?

Help and guidance are extremely appreciated!

Thanks and best wishes, Anmol.

themarpe commented 2 years ago

CC: @tersekmatija on NN part

Yes @28anmol, currently the means of interfacing with camera resources is by building the corresponding pipeline. We'll be exposing higher abstractions to make this simpler. But we also have a way of exposing the device as UVC camera, where one could do cv2.VideoCapture() on it to retrieve frames. (Docs: https://docs.luxonis.com/en/latest/pages/oak_webcam/?highlight=webcam)

tersekmatija commented 2 years ago

Hey @28anmol ,

I am not sure exactly which model is OpenCV using here, and I would need more information to say what the exact process would be, but here is the documentation from OpenVINO on how to export TF models.

I'd recommend you to use YuNet, though. We have an experiment here: https://github.com/luxonis/depthai-experiments/tree/master/gen2-face-detection. So you could take an already compiled blob and use it for face detection. We also show how to parse the results in the repository :)

28anmol commented 2 years ago

CC: @tersekmatija on NN part

Yes @28anmol, currently the means of interfacing with camera resources is by building the corresponding pipeline. We'll be exposing higher abstractions to make this simpler. But we also have a way of exposing the device as UVC camera, where one could do cv2.VideoCapture() on it to retrieve frames. (Docs: https://docs.luxonis.com/en/latest/pages/oak_webcam/?highlight=webcam)

Thanks a lot! Was very helpful!

28anmol commented 2 years ago

Hey @28anmol ,

I am not sure exactly which model is OpenCV using here, and I would need more information to say what the exact process would be, but here is the documentation from OpenVINO on how to export TF models.

I'd recommend you to use YuNet, though. We have an experiment here: https://github.com/luxonis/depthai-experiments/tree/master/gen2-face-detection. So you could take an already compiled blob and use it for face detection. We also show how to parse the results in the repository :)

Hey @tersekmatija, thanks a lot for the YuNet model suggestion. Your guidance was very useful. Indeed, I was successful in deploying the model on my oak-d camera. Now I can code it further for my application. I was curious to know, what information precisely are you expecting me to deliver you regarding the OpenCV face detection model. Maybe, if you provide me with the buzzwords, I could try to be more precise next time.

Another question regarding the main.py file of the YuNet model which basically explains how to deploy the trained model on oak d by drawing boxes and pointing out the major facial features: I understood the gist of the code, by pointing out the functionality of the major blocks of the code. But the problem I am encountering at the moment is if I am given a model file and asked to deploy it on oak-d (PS: any CV NN model) I ain't able to deploy it independently without looking up certain related pieces of code over the internet as the syntax isn't the same for any of them. For example, the codes of deployment of AI models don't look the same as conventional syntaxes of any programming languages like java, c, c++, python having strings, arrays, loops, conditional statements which are very straightforward. If I go through the complete understanding of the main.py file line by line, I am totally lost as to where the functions and methods come from, why are they being used there, what else could be coded in there(which I actually don't know as I have no book for that or something). Could you please guide me through this situation? How to handle this and gain expertise in this field?

PS: I have good programming knowledge but I am new to computer vision and dealing and deploying of NN models on oak d. Over the last 2 months, I have learned enough from the website to be able to deploy the blob files on oak d myself with the help of the codes they provide but still can't deploy an external model just like this OpenCV DNN face detection model.

Guidance is extremely appreciated. Thanks and regards, Anmol.

tersekmatija commented 2 years ago

Hey @28anmol ,

I meant more like which model was used. For example, for a task of object detection, various object detectors have been developed (like YoloV5 or any of the versions, or SSDLite with MobileNetV2 backbone). This helps a bit when converting the models, as some might have a different process of conversion, which you can see on the link from my previous post. You will also see what I was asking a bit from the next paragraphs. Usually, I can determine how to export by inspecting the source code :)

And that's a good question! It certainly takes time and some practice to learn how to deploy any model. It is great if you have some knowledge of Pytorch or TensorFlow, or other machine learning framework, in which your model was originally coded - but don't worry if you don't, in most cases you don't need a lot. The models are usually coded in different files than training or utils, but due to different styles of programming, there seems to be no standardized approach on how to split those files, and what functions to include where, so it can get quite confusing. When I am converting a model, I typically first look in the files used for evaluation (as that's where inference is made). There I try to understand how the author is loading the data (e.g., reading images in RGB or BGR format, do they standardize them, do they only scale them, what scale are they using, ...). This is an important step, because if you fail to specify the correct parameters for image preprocessing when converting your models with OpenVINO's mo.py, then the predictions will not make any sense. I write down this pre-processing steps. After that, I check how they do the post-processing of the output. I try to understand what they do and replicate this in a simpler way (sometimes some functions or parameters in those big repositories might be completely unnecessary).

Then I try to proceed with the conversion. I'd suggest you how to do this by looking at the link that I sent above. It's for TensorFlow models, but in the same documentation you can find instructions for all major frameworks. Typically, the process goes like this TF --> OpenVINO IR --> blob or Pytorch --> ONNX --> OpenVINO IR --> blob. Note that with Pytorch you have to first export to ONNX, and then convert to OpenVINO Intermediate representation. You get OpenVINO IR (xml and bin files) with model optimizer mo.py (see the docs from the previous post).

What I like to do before converting to blob is first check how this OpenVINO IR behaves. I usually don't specify any parameters to mo.py in the beginning (except data type FP16 and paths to the model). I load the image and preprocess it with Python code separately, and then use OpenVINO Python API to do the inference. After that I try to do the post-processing, and check whether the model is giving me the expected results. If not, something must have gone wrong, and I try to figure out what. If everything works fine, I try to "transform" preprocessing into parameters for mo.py. I reexport the model, and this time I try to load the image, do inference (and OV does the preprocessing automatically now), and examine the results. If this works, I am ready to compile to blob. For that I usually use our (blobconverter)[https://blobconverter.luxonis.com/] or do it locally. It is important to figure out pre-processing as you cannot do it directly on OAK for now + it's faster if it's done in the way I just described.

Because some models are more complex than others, I sometimes try to export the model first, before looking at the pre and post-processing, just to make sure I will be able to compile the model. If this works, and no major modifications are required, I take a look at the pre and post-processing and export again, this time with the correct parameters.

Summary of the tips:

Hope this will help you out a bit, at least in the beginning. If you will have further questions, don't hesitate to ask. We are trying to make this process as easy as possible for everyone, and we learn what we need to improve from questions like that! Also, I know that the deployment of the model looks scary and tedious in the beginning, but with some practice you see that in most of the cases it really isn't. But, might be a bit annoying at first ;)

28anmol commented 2 years ago

Thank you so much for a detailed explanation. Indeed, configuring the camera and deployment of the model is a bit frustrating initially, but soon it gets a lot better. My uni axial facetracker project finally works. Thank you for suggesting the YuNet model to accomplish this task.