[Feature-Request] Add new model(BiSeNetV2) into the model zoo(resources/nn)

franva commented 3 years ago

Start with the `why`:

Reason 1 I have trained a model which only does road segmentation with BiSeNetV2 network architecture and has a decent FPS(around 15 FPS). I would like to deploy it to my OAK camera.

Above is the demo with OpenVINO IR model format

Reason 2 Compare to the existing notebook for training a segmentation model(e.g. the deep lab v3) which uses out of date TF, the training process using PaddleSeg is much more delightful and much quicker. Within 15 minutes, I got my model trained and thanks to the newer network architecture, BiSeNetV2, I got satisfactory accuracy as well as a pretty good FPS (more than 3 times quicker than the existing road-segmentation-adas-1000 model). If this approach is approved to be useful, then we can quickly and more importantly easily train more custom models with less efforts. Thus it will enrich the model zoo with full of models with the latest architectures which will benefit the community and in turn it makes DepthAI and OAK camera more valuable.

Move to the `what`:

A road segmentation model(BiSeNetV2 arch) has been trained by using PaddleSeg and verified in format of OpenVINO IR model (the .bin, .xml). The current depthai_demo.py doesn't provide an easy way to do customized normalization before feeding a video frame to the model. I would like the code to be updated to enable the demo code to have an interface/place to pass in the transformations(e.g. in my case, the normalization).

Move to the `how`:

I have update the code(in a quick and dirty way) to apply transformation before feeding the video frames to model, it still looks not quite right by watching the video segmentation.

I have attached my code and trained model below for your convenience and this is where a DepthAI expert is needed to finish the last step.

Before running the code, please install python -m pip install paddlepaddle-gpu pip install paddleseg

infer.py, the code to verify the OpenVINO IR model is correct without defects after exporting from PaddleSeg,
model.bin and model.xml in IR model to be used with the infer.py
Example Video
Updated depthai_demo.py
depthai/resources/nn/road/road.json
depthai/resources/nn/road/handler.py
depthai/resources/nn/road/road.blob

The code base I used is the latest main branch which I pulled this morning. Please run this command: python depthai_demo.py --sync -cnn road -vid ./video.mp4 Please let me know if you need any more information and appreciated for your help and great work~!

Erol444 commented 3 years ago

Hello @franva, another great suggestion! I'm not that familiar with ML architectures so I wasn't aware that there's a deeplab alternative that's much faster. CC @tersekmatija

So on your suggestion - adding normalization - we could add it in depthai_demo but it would only work for video inputs (since normalziation would happen on host, before sending the frame to the device). As we mentioned, we also plan on adding normalization on the device side (possibly with ImageManip or directly on colorCamera). So looking at your code, normalization that you are doing is paddleseg.transforms.normalize(). We should add support like this inside depthai @themarpe .

In the meantime, @VanDavv, could you look into adding this model to the depthai_demo? I'm thinking we should add another optional function to handler.py like def preprocessing(frame): that would get called if you are using video input - so the paddleseg.transforms.normalize() could be used in handler.py.

franva commented 3 years ago

Hi @Erol444 Thanks for your update.

Yes, please add the normalization to the DeapthAI.

Also, the model is trained with image being preprocessed(e.g. transformations which contains Normalization). Understood what you said, it will only be applied to video input. But nowadays, all models, if not all, are fed with images which are preprocessed and Normalization is always the last step in the array of transformations. Without supporting customized normalization on the device size, I don't really know how are we going to apply the custom trained model for the camera input which will be almost all the use cases for OAK cameras.

Like the idea to add def preprocessing(frame) into handler.py, so it hides the complexity from users and provide a simple interface. Could we look into how to technically support pre-processing not only for video input but also for camera input(device side)? Or maybe some hero in DepthAI have even better and easier approach to implemented the device side proprocessing?

Really looking forward to your and others feedback.

Luxonis-Brandon commented 3 years ago

Does OpenVINO support normalization for us? CC: @szabi-luxonis and @PINTO0309.

PINTO0309 commented 3 years ago

If I were to do the same thing, I would merge the normalization layers at the same time I generate the OpenVINO IR.

To adjust the conversion process, you can also use the general (framework-agnostic) parameters: https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_convert_model_Converting_Model_General.html

--mean_values MEAN_VALUES, -ms MEAN_VALUES
                    Mean values to be used for the input image per
                    channel. Values to be provided in the (R,G,B) or
                    [R,G,B] format. Can be defined for desired input of
                    the model, for example: "--mean_values
                    data[255,255,255],info[255,255,255]". The exact
                    meaning and order of channels depend on how the
                    original model was trained.
--scale_values SCALE_VALUES
                    Scale values to be used for the input image per
                    channel. Values are provided in the (R,G,B) or [R,G,B]
                    format. Can be defined for desired input of the model,
                    for example: "--scale_values
                    data[255,255,255],info[255,255,255]". The exact
                    meaning and order of channels depend on how the
                    original model was trained.

If you can provide me with the ONNX file before converting it to OpenVINO IR (.xml/.bin), I can try it.

It is much easier than combining DepthAI modules, as it only requires two additional command line options. Multiply and Add will be automatically inserted in the red frame in the figure below. Screenshot 2021-08-22 12:40:50

Luxonis-Brandon commented 3 years ago

Thank you!

franva commented 3 years ago

If I were to do the same thing, I would merge the normalization layers at the same time I generate the OpenVINO IR.

* To adjust the conversion process, you can also use the general (framework-agnostic) parameters:
  https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_convert_model_Converting_Model_General.html

  --mean_values MEAN_VALUES, -ms MEAN_VALUES
                        Mean values to be used for the input image per
                        channel. Values to be provided in the (R,G,B) or
                        [R,G,B] format. Can be defined for desired input of
                        the model, for example: "--mean_values
                        data[255,255,255],info[255,255,255]". The exact
                        meaning and order of channels depend on how the
                        original model was trained.
  --scale_values SCALE_VALUES
                        Scale values to be used for the input image per
                        channel. Values are provided in the (R,G,B) or [R,G,B]
                        format. Can be defined for desired input of the model,
                        for example: "--scale_values
                        data[255,255,255],info[255,255,255]". The exact
                        meaning and order of channels depend on how the
                        original model was trained.

If you can provide me with the ONNX file before converting it to OpenVINO IR (.xml/.bin), I can try it.

It is much easier than combining DepthAI modules, as it only requires two additional command line options. Multiply and Add will be automatically inserted in the red frame in the figure below. Screenshot 2021-08-22 12:40:50

Hi @PINTO0309 thanks for your comment, yep it's better to embed the transformations inside a model, so code-wise developers don't need to worry about the transformation at all.

One more place to notice is : look at the normalization inside transformation from PaddleSeg.Transforms, it not only needs mean: [0.5, 0.5, 0.5], but also the std: [0.5, 0.5, 0.5]. I don't see the option to specify that in IR model. Hopefully there could be something we could do for the std

And sure, I am happy to provide the ONNX model :), here it is: Road-Segmentation-416x416-ONNX

PINTO0309 commented 3 years ago

--std ≒ --scale_values SCALE_VALUES

PINTO0309 commented 3 years ago

Here is an example. 0.007874016 ≒ (1.0 / 127)

$INTEL_OPENVINO_DIR/deployment_tools/model_optimizer/mo.py \
--input_model model.onnx \
--input_shape [1,3,416,416] \
--data_type FP32 \
--mean_values [127,127,127] \
--scale_values [127,127,127] \
--output_dir openvino/${H}x${W}/FP32

$INTEL_OPENVINO_DIR/deployment_tools/model_optimizer/mo.py \
--input_model model.onnx \
--input_shape [1,3,416,416] \
--data_type FP16 \
--mean_values [127,127,127] \
--scale_values [127,127,127] \
--output_dir openvino/${H}x${W}/FP16

sample.zip

Screenshot 2021-08-22 13:34:30 Screenshot 2021-08-22 13:35:09

franva commented 3 years ago

@PINTO0309 Beautiful~! Learnt a lot from you guys :+1:

Thanks a lot for the explanation and converted models. So in my case, the model is trained with std = [0.5,0.5,0.5], then I guess the value of scale_values would be : [2,2,2], am I correct?

PINTO0309 commented 3 years ago

For normalization to the range 0 to 1.

$ python3
Python 3.8.10 (default, Jun  2 2021, 10:49:15) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> data = np.full(3, 255)
>>> data / 127.5
array([2., 2., 2.])
>>> data / 127.5 - 1.0
array([1., 1., 1.])

or

>>> import numpy as np
>>> data = np.full(3, 255)
>>> data / 255.0
array([1., 1., 1.])

Thus, mean_values = 127 means subtraction by 127. scale_values = 127 signifie une division par 127. 1.0 is a number to normalize to the range 0-1 for the value after 127 has been subtracted.

If you set it to 127.5. 0.007843137 = (1 / 127.5)

$INTEL_OPENVINO_DIR/deployment_tools/model_optimizer/mo.py \
--input_model model.onnx \
--input_shape [1,3,416,416] \
--data_type FP16 \
--mean_values [127.5,127.5,127.5] \
--scale_values [127.5,127.5,127.5] \
--output_dir openvino/${H}x${W}/FP16

Screenshot 2021-08-22 14:06:24

or

  --scale SCALE, -s SCALE
                        All input values coming from original network inputs
                        will be divided by this value. When a list of inputs
                        is overridden by the --input parameter, this scale is
                        not applied for any input that does not match with the
                        original input of the model.

$INTEL_OPENVINO_DIR/deployment_tools/model_optimizer/mo.py \
--input_model model.onnx \
--input_shape [1,3,416,416] \
--data_type FP16 \
--mean_values [127.5,127.5,127.5] \
--scale 1.0 \
--output_dir openvino/x/FP16

Screenshot 2021-08-22 14:24:09

$INTEL_OPENVINO_DIR/deployment_tools/model_optimizer/mo.py \
--input_model model.onnx \
--input_shape [1,3,416,416] \
--data_type FP16 \
--mean_values [127.5,127.5,127.5] \
--scale 255.0 \
--output_dir openvino/x/FP16

Screenshot 2021-08-22 14:28:13 Screenshot 2021-08-22 14:28:05

The specification of OpenVINO is a little complicated.

franva commented 3 years ago

Hi @PINTO0309

Appreciated for the elaborated explanation, thumb up~!(I can't find where the emoji is.....so I typed...)

After dug into the code of PaddleSeg's Normalization, I found the range for normalization is between: [-1, 1].

Here is the code for default mean and std values,

Also for your convenience, I pasted the code here:

class Normalize:
    def __init__(self, mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)):

The Normalize class then calls functional.normalize(im,mean,std) to do the real job im = functional.normalize(im, mean, std)

I then had a look at the code for functional.normalize() Also pasted the code here for your convenience:

def normalize(im, mean, std):
    im = im.astype(np.float32, copy=False) / 255.0
    im -= mean
    im /= std
    return im

So after reading the code, I can see the range for the normalization in PaddleSeg is [-1, 1].

Then TWO problems come.

As you can see here in the normalize() function, there are THREE operations, whereas we can only specify 2 parameters with OpenVINO's command line(I will explain it below).

It applies the dividing first by dividing 255.0. So this maps to the --scale param in OpenVINO.

It then subtracts mean which is [0.5,0.5,0.5] which maps to the --mean_values in OpenVINO

Then here the FIRST problem arrives, it divides the std which maps to the --scale_values in OpenVINO.

BUT, check out this line of code in OpenVINO's repo:

if argv.scale and argv.scale_values:
        raise Error(
            'Both --scale and --scale_values are defined. Specify either scale factor or scale values per input ' +
            'channels. ' + refer_to_faq_msg(19))

It does not allow us to have the scale and scale_values together~!

So above is the 1st problem.

Here is the 2nd problem.

Back to the PaddleSeg's normalize() method, it applies

dividing by scale(255.0) FIRST,
subtracted by the mean [0.5,0.5,0.5] SECOND,
dividing by the scale_value [0.5,0.5,0.5] the LAST

So the sequence is important and must not be messed up. But in OpenVINO's code base, I am not able to find where these operations happen and in what order.

So I'm kinda frustrated, after so many discussions, we are back to the beginning. There still isn't any solution.

Any suggestions?

PINTO0309 commented 3 years ago

I always do this when normalizing to the range of -1.0 to 1.0. You should forget about 0.5. I haven't looked at the PyTorch implementation.

$INTEL_OPENVINO_DIR/deployment_tools/model_optimizer/mo.py \
--input_model model.onnx \
--input_shape [1,3,416,416] \
--data_type FP16 \
--scale_values [127.5,127.5,127.5] \
--mean_values [127.5,127.5,127.5] \
--output_dir openvino/x/FP16

  0 * 0.007843017578125 - 1.0 = -1.0
255 * 0.007843017578125 - 1.0 =  1.0

Screenshot 2021-08-22 21:22:04 Screenshot 2021-08-22 21:22:10

franva commented 3 years ago

Hi @PINTO0309

Thanks for your suggestion.

The Normalization doesn't come from PyTorch, it comes from PaddleSeg and here is the code:

def normalize(im, mean, std):
    im = im.astype(np.float32, copy=False) / 255.0
    im -= mean
    im /= std
    return im

As you can see here, there are THREE steps performed during the normalize(), but in your view of model architecture, there are only TWO steps which means it is lack of 1 operation.

Also, the 0.5 is not the range, it is the value for mean and std used when training the model. So if we didn't use the correct value of mean and std, the model would not return correct predictions. Hopefully, I explained it clearly.

Btw, good tool to view the model structure. May I know the name of this tool?

Thanks

PINTO0309 commented 3 years ago

The Normalization doesn't come from PyTorch, it comes from PaddleSeg and here is the code:

Oh, I'm sorry. When I came back from having dinner with my family, I had lost track of the flow of communication. :crying_cat_face:

If the original goal is to normalize to the range of -1.0 to 1.0, then I don't think the difference between three steps or two steps in the process is essential.

Model Visualization tool "Netron" https://netron.app/

SzabolcsGergely commented 3 years ago

Hi @PINTO0309

Thanks for your suggestion.

The Normalization doesn't come from PyTorch, it comes from PaddleSeg and here is the code:
def normalize(im, mean, std):
    im = im.astype(np.float32, copy=False) / 255.0
    im -= mean
    im /= std
    return im
As you can see here, there are THREE steps performed during the normalize(), but in your view of model architecture, there are only TWO steps which means it is lack of 1 operation.

Also, the 0.5 is not the range, it is the value for mean and std used when training the model. So if we didn't use the correct value of mean and std, the model would not return correct predictions. Hopefully, I explained it clearly.

Btw, good tool to view the model structure. May I know the name of this tool?

Thanks

You can rewrite normalization step as:

 def normalize(im, mean, std):
     mean = mean * 255.0
     std = std * 255.0
     im = im.astype(np.float32, copy=False) 
     im -= mean
     im /= std
     return im

Which means mean value 127.5, scale value 127.5.

PINTO0309 commented 3 years ago

Thanks! @szabi-luxonis

SzabolcsGergely commented 3 years ago

Does OpenVINO support normalization for us? CC: @szabi-luxonis and @PINTO0309.

It does, and it is captured in the documentation here. CC: @Erol444 for future recommendations.

We can add in the future mean and scale values for preview image of ColorCamera node, there is already option to output FP16, but it's not normalized. Adding normalization/scaling is quite simple. Regardless, the best and easiest way is including preprocessing in the model itself, IMO.

Erol444 commented 3 years ago

We can add in the future mean and scale values for preview image of ColorCamera node, there is already option to output FP16, but it's not normalized. Adding normalization/scaling is quite simple. Regardless, the best and easiest way is including preprocessing in the model itself, IMO.

Thanks for the link Szabi! And I agree preprocessing should be handled in the model. Maybe one upside of having it in the FW is for figuring out what input it actually requires (instead of reading the documentation). We could have a simple app that tries different common preprocessing techniques and view the output, so you can later apply correct preprocessing with mo.py.

franva commented 3 years ago

Hi @PINTO0309 Thanks for your suggestion. The Normalization doesn't come from PyTorch, it comes from PaddleSeg and here is the code:
def normalize(im, mean, std):
    im = im.astype(np.float32, copy=False) / 255.0
    im -= mean
    im /= std
    return im
As you can see here, there are THREE steps performed during the normalize(), but in your view of model architecture, there are only TWO steps which means it is lack of 1 operation. Also, the 0.5 is not the range, it is the value for mean and std used when training the model. So if we didn't use the correct value of mean and std, the model would not return correct predictions. Hopefully, I explained it clearly. Btw, good tool to view the model structure. May I know the name of this tool? Thanks
You can rewrite normalization step as:
 def normalize(im, mean, std):
     mean = mean * 255.0
     std = std * 255.0
     im = im.astype(np.float32, copy=False) 
     im -= mean
     im /= std
     return im
Which means mean value 127.5, scale value 127.5.

Hi @szabi-luxonis

Yep, I hope I could re-write the normalize() function, but it's from PaddleSeg which is not under my control.

franva commented 3 years ago

hi guys, I finally understood why just specifying mean_values and scale_values would work.

Thanks you so much for your explanations.

luxonis / depthai