Open franva opened 3 years ago
Hello @franva, another great suggestion! I'm not that familiar with ML architectures so I wasn't aware that there's a deeplab alternative that's much faster. CC @tersekmatija
So on your suggestion - adding normalization - we could add it in depthai_demo but it would only work for video inputs (since normalziation would happen on host, before sending the frame to the device). As we mentioned, we also plan on adding normalization on the device side (possibly with ImageManip or directly on colorCamera). So looking at your code, normalization that you are doing is paddleseg.transforms.normalize(). We should add support like this inside depthai @themarpe .
In the meantime, @VanDavv, could you look into adding this model to the depthai_demo? I'm thinking we should add another optional function to handler.py
like def preprocessing(frame):
that would get called if you are using video input - so the paddleseg.transforms.normalize()
could be used in handler.py
.
Hi @Erol444 Thanks for your update.
Yes, please add the normalization to the DeapthAI.
Also, the model is trained with image being preprocessed(e.g. transformations which contains Normalization). Understood what you said, it will only be applied to video input. But nowadays, all models, if not all, are fed with images which are preprocessed and Normalization is always the last step in the array of transformations. Without supporting customized normalization on the device size, I don't really know how are we going to apply the custom trained model for the camera input which will be almost all the use cases for OAK cameras.
Like the idea to add def preprocessing(frame)
into handler.py
, so it hides the complexity from users and provide a simple interface. Could we look into how to technically support pre-processing not only for video input but also for camera input(device side)? Or maybe some hero in DepthAI have even better and easier approach to implemented the device side proprocessing?
Really looking forward to your and others feedback.
Does OpenVINO support normalization for us? CC: @szabi-luxonis and @PINTO0309.
If I were to do the same thing, I would merge the normalization layers at the same time I generate the OpenVINO IR.
--mean_values MEAN_VALUES, -ms MEAN_VALUES
Mean values to be used for the input image per
channel. Values to be provided in the (R,G,B) or
[R,G,B] format. Can be defined for desired input of
the model, for example: "--mean_values
data[255,255,255],info[255,255,255]". The exact
meaning and order of channels depend on how the
original model was trained.
--scale_values SCALE_VALUES
Scale values to be used for the input image per
channel. Values are provided in the (R,G,B) or [R,G,B]
format. Can be defined for desired input of the model,
for example: "--scale_values
data[255,255,255],info[255,255,255]". The exact
meaning and order of channels depend on how the
original model was trained.
If you can provide me with the ONNX file before converting it to OpenVINO IR (.xml/.bin), I can try it.
It is much easier than combining DepthAI modules, as it only requires two additional command line options. Multiply
and Add
will be automatically inserted in the red frame in the figure below.
Thank you!
If I were to do the same thing, I would merge the normalization layers at the same time I generate the OpenVINO IR.
* To adjust the conversion process, you can also use the general (framework-agnostic) parameters: https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_convert_model_Converting_Model_General.html
--mean_values MEAN_VALUES, -ms MEAN_VALUES Mean values to be used for the input image per channel. Values to be provided in the (R,G,B) or [R,G,B] format. Can be defined for desired input of the model, for example: "--mean_values data[255,255,255],info[255,255,255]". The exact meaning and order of channels depend on how the original model was trained. --scale_values SCALE_VALUES Scale values to be used for the input image per channel. Values are provided in the (R,G,B) or [R,G,B] format. Can be defined for desired input of the model, for example: "--scale_values data[255,255,255],info[255,255,255]". The exact meaning and order of channels depend on how the original model was trained.
If you can provide me with the ONNX file before converting it to OpenVINO IR (.xml/.bin), I can try it.
It is much easier than combining DepthAI modules, as it only requires two additional command line options.
Multiply
andAdd
will be automatically inserted in the red frame in the figure below.
Hi @PINTO0309 thanks for your comment, yep it's better to embed the transformations inside a model, so code-wise developers don't need to worry about the transformation at all.
One more place to notice is : look at the normalization inside transformation from PaddleSeg.Transforms, it not only needs mean
: [0.5, 0.5, 0.5], but also the std: [0.5, 0.5, 0.5].
I don't see the option to specify that in IR model. Hopefully there could be something we could do for the std
And sure, I am happy to provide the ONNX model :), here it is: Road-Segmentation-416x416-ONNX
--std
≒ --scale_values SCALE_VALUES
Here is an example. 0.007874016 ≒ (1.0 / 127)
$INTEL_OPENVINO_DIR/deployment_tools/model_optimizer/mo.py \
--input_model model.onnx \
--input_shape [1,3,416,416] \
--data_type FP32 \
--mean_values [127,127,127] \
--scale_values [127,127,127] \
--output_dir openvino/${H}x${W}/FP32
$INTEL_OPENVINO_DIR/deployment_tools/model_optimizer/mo.py \
--input_model model.onnx \
--input_shape [1,3,416,416] \
--data_type FP16 \
--mean_values [127,127,127] \
--scale_values [127,127,127] \
--output_dir openvino/${H}x${W}/FP16
@PINTO0309 Beautiful~! Learnt a lot from you guys :+1:
Thanks a lot for the explanation and converted models.
So in my case, the model is trained with std = [0.5,0.5,0.5], then
I guess the value of scale_values
would be : [2,2,2], am I correct?
For normalization to the range 0 to 1.
$ python3
Python 3.8.10 (default, Jun 2 2021, 10:49:15)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> data = np.full(3, 255)
>>> data / 127.5
array([2., 2., 2.])
>>> data / 127.5 - 1.0
array([1., 1., 1.])
or
>>> import numpy as np
>>> data = np.full(3, 255)
>>> data / 255.0
array([1., 1., 1.])
Thus, mean_values = 127
means subtraction by 127
. scale_values = 127
signifie une division par 127
. 1.0 is a number to normalize to the range 0-1 for the value after 127 has been subtracted.
If you set it to 127.5
. 0.007843137 = (1 / 127.5)
$INTEL_OPENVINO_DIR/deployment_tools/model_optimizer/mo.py \
--input_model model.onnx \
--input_shape [1,3,416,416] \
--data_type FP16 \
--mean_values [127.5,127.5,127.5] \
--scale_values [127.5,127.5,127.5] \
--output_dir openvino/${H}x${W}/FP16
or
--scale SCALE, -s SCALE
All input values coming from original network inputs
will be divided by this value. When a list of inputs
is overridden by the --input parameter, this scale is
not applied for any input that does not match with the
original input of the model.
$INTEL_OPENVINO_DIR/deployment_tools/model_optimizer/mo.py \
--input_model model.onnx \
--input_shape [1,3,416,416] \
--data_type FP16 \
--mean_values [127.5,127.5,127.5] \
--scale 1.0 \
--output_dir openvino/x/FP16
$INTEL_OPENVINO_DIR/deployment_tools/model_optimizer/mo.py \
--input_model model.onnx \
--input_shape [1,3,416,416] \
--data_type FP16 \
--mean_values [127.5,127.5,127.5] \
--scale 255.0 \
--output_dir openvino/x/FP16
The specification of OpenVINO is a little complicated.
Hi @PINTO0309
Appreciated for the elaborated explanation, thumb up~!(I can't find where the emoji is.....so I typed...)
After dug into the code of PaddleSeg's Normalization, I found the range for normalization is between: [-1, 1].
Here is the code for default mean
and std
values,
Also for your convenience, I pasted the code here:
class Normalize:
def __init__(self, mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)):
The Normalize class then calls functional.normalize(im,mean,std)
to do the real job
im = functional.normalize(im, mean, std)
I then had a look at the code for functional.normalize()
Also pasted the code here for your convenience:
def normalize(im, mean, std):
im = im.astype(np.float32, copy=False) / 255.0
im -= mean
im /= std
return im
So after reading the code, I can see the range for the normalization in PaddleSeg is [-1, 1].
Then TWO problems come.
As you can see here in the normalize()
function, there are THREE operations, whereas we can only specify 2 parameters with OpenVINO's command line(I will explain it below).
It applies the dividing first by dividing 255.0. So this maps to the --scale
param in OpenVINO.
It then subtracts mean which is [0.5,0.5,0.5]
which maps to the --mean_values
in OpenVINO
Then here the FIRST problem arrives, it divides the std
which maps to the --scale_values
in OpenVINO.
BUT, check out this line of code in OpenVINO's repo:
if argv.scale and argv.scale_values:
raise Error(
'Both --scale and --scale_values are defined. Specify either scale factor or scale values per input ' +
'channels. ' + refer_to_faq_msg(19))
It does not allow us to have the scale
and scale_values
together~!
So above is the 1st problem.
Here is the 2nd problem.
Back to the PaddleSeg's normalize()
method, it applies
[0.5,0.5,0.5]
SECOND,[0.5,0.5,0.5]
the LASTSo the sequence is important and must not be messed up. But in OpenVINO's code base, I am not able to find where these operations happen and in what order.
So I'm kinda frustrated, after so many discussions, we are back to the beginning. There still isn't any solution.
Any suggestions?
I always do this when normalizing to the range of -1.0 to 1.0. You should forget about 0.5. I haven't looked at the PyTorch implementation.
$INTEL_OPENVINO_DIR/deployment_tools/model_optimizer/mo.py \
--input_model model.onnx \
--input_shape [1,3,416,416] \
--data_type FP16 \
--scale_values [127.5,127.5,127.5] \
--mean_values [127.5,127.5,127.5] \
--output_dir openvino/x/FP16
0 * 0.007843017578125 - 1.0 = -1.0
255 * 0.007843017578125 - 1.0 = 1.0
Hi @PINTO0309
Thanks for your suggestion.
The Normalization doesn't come from PyTorch, it comes from PaddleSeg and here is the code:
def normalize(im, mean, std):
im = im.astype(np.float32, copy=False) / 255.0
im -= mean
im /= std
return im
As you can see here, there are THREE steps performed during the normalize()
, but in your view of model architecture, there are only TWO steps which means it is lack of 1 operation.
Also, the 0.5 is not the range, it is the value for mean
and std
used when training the model. So if we didn't use the correct value of mean
and std
, the model would not return correct predictions. Hopefully, I explained it clearly.
Btw, good tool to view the model structure. May I know the name of this tool?
Thanks
The Normalization doesn't come from PyTorch, it comes from PaddleSeg and here is the code:
Oh, I'm sorry. When I came back from having dinner with my family, I had lost track of the flow of communication. :crying_cat_face:
If the original goal is to normalize to the range of -1.0 to 1.0, then I don't think the difference between three steps or two steps in the process is essential.
Hi @PINTO0309
Thanks for your suggestion.
The Normalization doesn't come from PyTorch, it comes from PaddleSeg and here is the code:
def normalize(im, mean, std): im = im.astype(np.float32, copy=False) / 255.0 im -= mean im /= std return im
As you can see here, there are THREE steps performed during the
normalize()
, but in your view of model architecture, there are only TWO steps which means it is lack of 1 operation.Also, the 0.5 is not the range, it is the value for
mean
andstd
used when training the model. So if we didn't use the correct value ofmean
andstd
, the model would not return correct predictions. Hopefully, I explained it clearly.Btw, good tool to view the model structure. May I know the name of this tool?
Thanks
You can rewrite normalization step as:
def normalize(im, mean, std):
mean = mean * 255.0
std = std * 255.0
im = im.astype(np.float32, copy=False)
im -= mean
im /= std
return im
Which means mean value 127.5, scale value 127.5.
Thanks! @szabi-luxonis
Does OpenVINO support normalization for us? CC: @szabi-luxonis and @PINTO0309.
It does, and it is captured in the documentation here. CC: @Erol444 for future recommendations.
We can add in the future mean
and scale
values for preview
image of ColorCamera
node, there is already option to output FP16, but it's not normalized. Adding normalization/scaling is quite simple.
Regardless, the best and easiest way is including preprocessing in the model itself, IMO.
We can add in the future
mean
andscale
values forpreview
image ofColorCamera
node, there is already option to output FP16, but it's not normalized. Adding normalization/scaling is quite simple. Regardless, the best and easiest way is including preprocessing in the model itself, IMO.
Thanks for the link Szabi! And I agree preprocessing should be handled in the model. Maybe one upside of having it in the FW is for figuring out what input it actually requires (instead of reading the documentation). We could have a simple app that tries different common preprocessing techniques and view the output, so you can later apply correct preprocessing with mo.py.
Hi @PINTO0309 Thanks for your suggestion. The Normalization doesn't come from PyTorch, it comes from PaddleSeg and here is the code:
def normalize(im, mean, std): im = im.astype(np.float32, copy=False) / 255.0 im -= mean im /= std return im
As you can see here, there are THREE steps performed during the
normalize()
, but in your view of model architecture, there are only TWO steps which means it is lack of 1 operation. Also, the 0.5 is not the range, it is the value formean
andstd
used when training the model. So if we didn't use the correct value ofmean
andstd
, the model would not return correct predictions. Hopefully, I explained it clearly. Btw, good tool to view the model structure. May I know the name of this tool? ThanksYou can rewrite normalization step as:
def normalize(im, mean, std): mean = mean * 255.0 std = std * 255.0 im = im.astype(np.float32, copy=False) im -= mean im /= std return im
Which means mean value 127.5, scale value 127.5.
Hi @szabi-luxonis
Yep, I hope I could re-write the normalize()
function, but it's from PaddleSeg which is not under my control.
hi guys, I finally understood why just specifying mean_values
and scale_values
would work.
Thanks you so much for your explanations.
Start with the
why
:Reason 1 I have trained a model which only does road segmentation with BiSeNetV2 network architecture and has a decent FPS(around 15 FPS). I would like to deploy it to my OAK camera.
Above is the demo with OpenVINO IR model format
Reason 2 Compare to the existing notebook for training a segmentation model(e.g. the deep lab v3) which uses out of date TF, the training process using PaddleSeg is much more delightful and much quicker. Within 15 minutes, I got my model trained and thanks to the newer network architecture, BiSeNetV2, I got satisfactory accuracy as well as a pretty good FPS (more than 3 times quicker than the existing road-segmentation-adas-1000 model). If this approach is approved to be useful, then we can quickly and more importantly easily train more custom models with less efforts. Thus it will enrich the model zoo with full of models with the latest architectures which will benefit the community and in turn it makes DepthAI and OAK camera more valuable.
Move to the
what
:A road segmentation model(BiSeNetV2 arch) has been trained by using PaddleSeg and verified in format of OpenVINO IR model (the
.bin
,.xml
). The currentdepthai_demo.py
doesn't provide an easy way to do customized normalization before feeding a video frame to the model. I would like the code to be updated to enable the demo code to have an interface/place to pass in the transformations(e.g. in my case, the normalization).Move to the
how
:I have update the code(in a quick and dirty way) to apply transformation before feeding the video frames to model, it still looks not quite right by watching the video segmentation.
I have attached my code and trained model below for your convenience and this is where a DepthAI expert is needed to finish the last step.
Before running the code, please install
python -m pip install paddlepaddle-gpu
pip install paddleseg
infer.py
, the code to verify the OpenVINO IR model is correct without defects after exporting from PaddleSeg,infer.py
depthai_demo.py
The code base I used is the latest
main
branch which I pulled this morning. Please run this command:python depthai_demo.py --sync -cnn road -vid ./video.mp4
Please let me know if you need any more information and appreciated for your help and great work~!