openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
7.06k stars 2.22k forks source link

[Bug] CPU and MYRIAD output difference #7674

Closed tersekmatija closed 1 year ago

tersekmatija commented 3 years ago
System information (version)
Detailed description

Hey, I'm having another issue with CPU and MYRIAD output being way off. I am using Selfie Segmentation Landscape model from MediaPipe and I use tflite2tensorflow to generate the ONNX model (I am attaching ONNX in the zip).

When comparing MediaPipe, ONNX, and OpenVINO IE output (CPU), it's pretty much the same with really small absolute error (< 2e-4). However, when comparing the output of OpenVINO IE Myriad output to the original one, it's way off (absolute error can be 1, so maximum possible). I am attaching the image of absolute differences between MediaPipe results and inference from bin and xml on Myriad (which you can also see in the Colab that reproduces these results):

Error

Similar issue has also been described here: https://github.com/PINTO0309/tflite2tensorflow/issues/9

There was also a similar issue with YoloV5 Pytorch model, which was solved by calling model.half(), which set the precision to FP16, BEFORE exporting the torch model to ONNX. So I am assuming there must be something wrong with quantization to FP16, as simply setting data_type FP16 when calling mo.py didn't work and results were bad, until model.half() was called.

However, in tensorflow, it's not possible to save the model with FP16 precision, so I have to directly rely on data_type FP16, which is not working as it should for me. Do you have some insight into why this would happen?

Steps to reproduce

I am providing the Colab which goes over all steps: https://colab.research.google.com/drive/19Ka407n6tZIs0kJn2mY5IHXPRbniZ2uk?usp=sharing

I am also attaching ZIP with onnx model and npy which contains results of inference on Myriad.

Thanks in advance!

selfie_segmentation.zip

Iffa-Intel commented 3 years ago

Hi,

If you compare the performance of CPU with MYRIAD definitely CPU would have a better performance. MYRIAD or NCS2 is just an accelerator in which the processing power is not as powerful as the CPU.

You may refer here for the result of OpenVINO inferencing on different hardware.

The VPU requires FP16 format and CPU preferred FP32. It's obvious that squeezing weight into a smaller format would indeed sacrifice the accuracy of the model or also known as quantization error. However, a larger weight format would require more compute resources and time. You may refer here for a detailed explanation.

tersekmatija commented 3 years ago

Hey @Iffa-Meah , I understand that this is expected, of course, but still, I believe it should not be that much better?

For the sake of this conversation I tested the same XML and BIN generated with data__type FP16 with target device as GPU, as FP16 is preferred format on the GPU (as I understand from the video). The output with device_name='GPU' is still significantly more similar to the output of the initial model than the output with device_name='MYRIAD'. See the image of absolute errors for GPU below: GPU output

Maximum absolute difference is now 0.10, way better than 1.0.

Any insight into why xml and bin with data type FP16 give more similar output when run on GPU than when run on MYRIAD? I expected them to be almost the same.

Iffa-Intel commented 3 years ago

The conversion within OpenVINO would involve the Model Optimizer.

This Model Optimizer would generally:

  1. Convert model into IR
  2. Optimize the model which would save up a lot of computation power and memory
  3. change the model format accordingly

In this case, getting a much better result is expected since that is the purpose of optimization.

However, if compared by the result of inferencing FP16 IR with the native format, I believe they should have the same pattern as CPU and GPU. In terms of fps I could see that there's a slight improvement for IR compared to the native format. We'll look further into this.

myriad_onnx

myriad_IR

jgespino commented 3 years ago

Hi @tersekmatija

Thank you for providing all the details to reproduce the issue. I am able to see the difference when comparing MYRIAD inference results with CPU and ONNX runtime. I have opened a bug with the development team to get their input.

I will let you know what I find out.

Regards, Jesus

Ref. 66793

tersekmatija commented 2 years ago

Hi @jgespino , any updates?

Thanks, Matija

jgespino commented 2 years ago

Hi @tersekmatija

Apologies for the delay, I don't have an update just yet. Let me follow up with the development team.

Regards, Jesus

tersekmatija commented 2 years ago

Hi @jgespino .

I think the issue must be related to some layers in segmentation networks. For example, a similar issue appears with MODNet as well.

jgespino commented 2 years ago

@tersekmatija Thank you, I'm still waiting to hear back from the development team.

tersekmatija commented 2 years ago

Hey, @jgespino, any progress already?

We've had a similar issue as well with a model from the client. The outputs from the CPU and GPU are the same, while for Myriad it is different. I think that GroupConvolution layer might be causing the issues (just my initial guess). I compared the architectures of the other models where the outputs are different vs arhictectures where the outputs are similar (with significantly small difference), and it seems that the models with different outputs on CPU/GPU and Myriad have GroupConvolution layers. But I might be wrong here, as some models with this layer also work normally. Any update on that would be much appreciated!

jgespino commented 2 years ago

@tersekmatija I have added your comments to the internal ticket with the dev team. Sorry about the delay, I'm still waiting to hear back.

SzabolcsGergely commented 2 years ago

@tersekmatija I have added your comments to the internal ticket with the dev team. Sorry about the delay, I'm still waiting to hear back.

Is there an internal IPS ticket for it? If yes, can you add us as followers to it?

jgespino commented 2 years ago

Hi @szabi-luxonis @tersekmatija

There is currently no IPS case for this issue, the ticket opened with the development team is in a separate internal tool. Since you have IPS access, would you mind opening an IPS ticket as well?

Regards, Jesus

drux007 commented 2 years ago

Hi @jgespino,

any updates on this issue?

We also ran into the same issue. The results when running the same models on MYRIAD and CPU/GPU differ so significantly that MYRIAD cannot be used.

Inside our product, we usually use INT8 Quantized models which are not supported for VPUs (see here).

Therefore we have to use unquantized 32-bit models which result in significantly different results when running on MYRIAD compared to running the same models on CPU/GPU.

Since we are aware that MYRIAD support only FP_16 (see here), we tried to convert the models beforehand to 16-bit precision using convert-script, but there again we get correct results when running such models on GPU with GPU_FP16 and wrong ones when running on MYRIAD.

drux007 commented 2 years ago

@tersekmatija did you manage to solve the issue?

tersekmatija commented 2 years ago

Hey @drux007 ,

sadly not yet.

drux007 commented 2 years ago

Hey @drux007 ,

sadly not yet.

Thanks for your quick answer @tersekmatija.

Did you simply stop using the stick or stopped due to the difference in the output? What did you conclude in that case - the computation of the stick is not good enough?

tersekmatija commented 2 years ago

Hey @drux007 ,

I just switched a model, but I'd still like to see this resolved :) I found out that this issue is rarely the case, but can happen some time. I think you could modify the scales of some layers manually by providing config files when doing inference or compiling a blob.

tersekmatija commented 1 year ago

@andrei-kochin I see this was closed as completed, but it doesn't look like the issue has been resolved really.

andrei-kochin commented 1 year ago

@tersekmatija As per https://www.intel.com/content/www/us/en/developer/articles/release-notes/openvino-2022-3-lts-relnotes.html Myriad will be in the maintenance mode and updates are expected for 2022.3 LTS iterations.

I'll let @Maxim-Doronin to comment on that.

andrei-kochin commented 1 year ago

@AndreeaAtudorei please update the status here. Will this bug addressed in upcoming 22.3.1 ?

AndreeaAtudorei commented 1 year ago

no action has been planned for the purpose of this ticket for OV 22.3.1 release. however, if I understand correctly Maksim's remark on the ticket, it is possible to be solved by FW upgrade so I would like to ask you to try and confirm/deny if the ticket is still valid in the context of the new 22.3.1 release where the fw is updated.

tersekmatija commented 1 year ago

Hey, which version should solve this @AndreeaAtudorei ? I've just tried with 22.1 from here but the results are the same:

index

AndreeaAtudorei commented 1 year ago

hi, my thought was to try with the version/branch that will be released in 22.3.1

tersekmatija commented 1 year ago

Is this the correct branch?

AndreeaAtudorei commented 1 year ago

yes

tersekmatija commented 1 year ago

Ok, so I manually compiled the library, re-exported with new mo.py (2022.3 release branch I linked above), then done the inference on device. Doesn't seem anything is changed really - see below.

image

avitial commented 1 year ago

@tersekmatija do you see any noticeable difference if you load the ONNX model instead of the IR model?

tersekmatija commented 1 year ago

What's the API to load ONNX model to MX directly? My understanding is the model gets compiled under the hood anyway? @avitial

avitial commented 1 year ago

@tersekmatija simply pass the onnx model instead of the .xml like net = ie.read_network('model_float32.onnx'). Also, are you using the new OpenVINO 2.0 API with 2022.3? Just curious, not sure this has any effect on the results or at least it shouldn't.

tersekmatija commented 1 year ago

I don't think API matters since the same MX plugin is used for inference? The results are the same when using ONNX like that yes. @avitial

andrei-kochin commented 1 year ago

@tersekmatija you are right, if the issues lies in the plugin it doesn't matter which path you used to get IR/nGraph representation.

@AndreeaAtudorei any thoughts on why the issue is still there? Maybe some additional steps are required?

tersekmatija commented 1 year ago

Yes, my understanding is it's in the plugin directly, since it only happens on MX but not on CPU/GPU. I've used model optimizer from the corresponding OV version too.

AndreeaAtudorei commented 1 year ago

unfortunately I have no idea. debugging would be useful, but unfortunately we do not have the capacity

tersekmatija commented 1 year ago

So it will not be fixed in 2022.3 and thus ever?

ilya-lavrenov commented 1 year ago

So it will not be fixed in 2022.3 and thus ever?

yes

jgespino commented 1 year ago

If you tweak your code (test-model_float32.py) and use the onnx model as is or if you convert the model to IR with mo --input_model model_float32.onnx --data_type FP16 --input_shape [1,144,256,3], there may be an improvement in the result.

// L16 outputs = ort_sess.run(None, {'input_1': image.astype(np.float32)/255{color}})

// L16 outputs = ort_sess.run(None, {'input_1': image.astype(np.float32){color}})

As ilya mentioned, this won't be fixed but hopefully the above improves your results.

tersekmatija commented 1 year ago

Since we are doing the inference on MyriadX, this doesn't really help.