Big difference in output results between CPU and MYRIAD

s-t-e-p-a-n-o-v commented 4 years ago

Hi!

Network was converted to IR (FP16) without any problems. It works fine on CPU or GPU, but fails to run on MYRIAD (NCS2) (wrong result on output). Tried option 'VPU_HW_STAGES_OPTIMIZATION': 'NO' with no luck. OpenVINO 2020.3.194

zip with network, data, and test is here https://easyupload.io/j54lhy

Layers check report via ie.query_network() for MYRIAD: Following layers are not supported by the plugin for specified device MYRIAD: input_8:0, StatefulPartitionedCall/pose_net/flatten/Reshape/Cast_111373_const, Constant_19683, Constant_19680 It's standard reshape layer and multiplication constants. Also tried inference with HETERO plugin (MYRIAD,CPU), but it has same wrong output.

Unit-test code is below: `import numpy as np from openvino.inference_engine import IECore

ie = IECore() net = ie.load_network(ie.read_network('net.xml', 'net.bin'),"MYRIAD") ie.set_config({'VPU_HW_STAGES_OPTIMIZATION': 'NO'}, "MYRIAD")

ut_in = np.load('ut_net_inp.npy')

out = net.infer({ next(iter(net.inputs)) : ut_in}) out = out['StatefulPartitionedCall/pose_net/concatenate/concat'] print(out)

ut_out = np.load('ut_net_out.npy') print(ut_out)`

CPU output [[-1.1620678e-03 -7.9460838e-04 -1.5815950e-03 -4.1352790e-03 -7.9219695e-03 8.3919901e-01]] GPU output [[-1.1544225e-03 -7.8050740e-04 -1.5786547e-03 -4.1782297e-03 -7.9199485e-03 8.3894002e-01]] MYRIAD output [[-0.0008297 0.00150967 -0.00083733 0.00987244 -0.00392151 0.54541016]] HETERO:MYRIAD,CPU output [[-0.00082684 0.00146389 -0.00096035 0.00982666 -0.00382805 0.54589844]] Correct output [[-1.1544181e-03 -7.8051008e-04 -1.5786485e-03 -4.1782842e-03 -7.9199774e-03 8.3894008e-01]]

Any help is appreciated!

s-t-e-p-a-n-o-v commented 4 years ago

With layer-to-layer output comparison (CPU vs MYRIAD) according to graph from DL Workbench I found layer where MYRIAD is working wrong. But the question is why it's happens ?

Graph and part of outputs data of second concatenation layer is below:

graph out_data

jgespino commented 4 years ago

Hi @s-t-e-p-a-n-o-v

VPU_HW_STAGES_OPTIMIZATION should be set to YES, do you get any errors when loading the model? Do you see the same error with OpenVINO toolkit 2020.4 release? Could you share your onnx model?

Regards, Jesus

s-t-e-p-a-n-o-v commented 4 years ago

Hi!

I haven't any errors during model loading. Reuploaded zip with network, data, test and ONNX, it's here now https://easyupload.io/pej9t6

1st concatenation is done by 3rd index (1x576x12x40 -> 1x576x14x40) 2nd concatenation is done by 4th index (1x576x14x40 -> 1x576x14x42) Is there any restrictions for concatenation 4d tensors in VPU device ?

I will check 2020.4 release and provide report soon.

s-t-e-p-a-n-o-v commented 4 years ago

Upgraded OpenVINO to 2020.4 and converted ONNX from scratch, but results is the same - wrong output.

jgespino commented 4 years ago

Hi @s-t-e-p-a-n-o-v

I also see the reported behavior on the latest release, thank you for providing all the necessary information to reproduce. I have asked the development team for assistance and will get back to you.

Regards, Jesus

Ref. 36539

leoll2 commented 4 years ago

net = ie.load_network(ie.read_network('net.xml', 'net.bin'),"MYRIAD") ie.set_config({'VPU_HW_STAGES_OPTIMIZATION': 'NO'}, "MYRIAD")

You should swap these two lines, otherwise the network is loaded with vpu_optimizations enabled. I haven't tried specifically with your data, but I did experiments with my own data and the resulting output changes a lot depending on the order of those two lines.

s-t-e-p-a-n-o-v commented 4 years ago

net = ie.load_network(ie.read_network('net.xml', 'net.bin'),"MYRIAD") ie.set_config({'VPU_HW_STAGES_OPTIMIZATION': 'NO'}, "MYRIAD")

You should swap these two lines, otherwise the network is loaded with vpu_optimizations enabled. I haven't tried specifically with your data, but I did experiments with my own data and the resulting output changes a lot depending on the order of those two lines.

Hi!

It works now! Many thanks!

MYRIAD output [[-1.1653900e-03 -8.3017349e-04 -1.4457703e-03 -4.1770935e-03 -7.9345703e-03 8.3740234e-01]] Correct output [[-1.1544181e-03 -7.8051008e-04 -1.5786485e-03 -4.1782842e-03 -7.9199774e-03 8.3894008e-01]]

s-t-e-p-a-n-o-v commented 4 years ago

There is recent official workaround "How to Fix Inaccurate Results on Intel® Neural Compute Stick" Its here https://www.intel.com/content/www/us/en/support/articles/000056285/boards-and-kits/neural-compute-sticks.html Is it better to set optimization to 'NO' by default ?

jgespino commented 4 years ago

Hi @s-t-e-p-a-n-o-v

Apologies for the delay in our response. It's best to keep optimization set to Yes, this parameters is only intended to be used for internal debugging. We are working on updating the documentation and API.

If your model is working good with the optimization set to Off it should be okay. I have passed your model and test code for the development team to further inspect, however, they can't take a look until the next release.

Please let me know if you are okay with closing this issue for now.

Regards, Jesus

edwardnguyen1705 commented 4 years ago

Dear @jgespino , I am able to convert the full yolov5 v2 models to OpenVINO, and here is a converted model: yolov5xxs_320_custom_up_face. However, there is a significant mAP drop when loading the model to HDDL device compared to CPU. While doing plugins['HDDL'].set_config({'VPU_HW_STAGES_OPTIMIZATION': 'NO'}) can reduce the mAP drop, the speed (FPS) decreases almost half. I observe that with plugins['HDDL'].set_config({'VPU_HW_STAGES_OPTIMIZATION': 'YES'}), the predicted confidence scores of the model drop. The experiments have been conducted in both 2020R1 and 2020R4. Here is our hardware info: tank_core_i5_info.txt

Could you mind examining the given model? If you need more information, please just let me know. I really appreciate your time.

jgespino commented 4 years ago

Hi @s-t-e-p-a-n-o-v

For your model, there is a half precision overflow on MatMul layer which is converted to FullyConnected by the model optimizer. We recommend regenerating the IR using the model optimizer with --scale parameter. This should be analyzed by the myriad plugin correctly and there should be no overflow. Then you should be able to accuratly infer your model with VPU_HW_STAGES_OPTIMIZAION enabled.

@edwardnguyen1705 Thanks for reaching out, could you please start a new GitHub issue? Please also try on the latest OpenVINO 2021.1 release.

Regards, Jesus

s-t-e-p-a-n-o-v commented 4 years ago

Hi @jgespino!

Many thanks for the clarification!

jgespino commented 4 years ago

@s-t-e-p-a-n-o-v No problem, please re-open if you need additional assistance.

openvinotoolkit / openvino

Big difference in output results between CPU and MYRIAD #1551