Closed tersekmatija closed 1 year ago
Hi,
If you compare the performance of CPU with MYRIAD definitely CPU would have a better performance. MYRIAD or NCS2 is just an accelerator in which the processing power is not as powerful as the CPU.
You may refer here for the result of OpenVINO inferencing on different hardware.
The VPU requires FP16 format and CPU preferred FP32. It's obvious that squeezing weight into a smaller format would indeed sacrifice the accuracy of the model or also known as quantization error. However, a larger weight format would require more compute resources and time. You may refer here for a detailed explanation.
Hey @Iffa-Meah , I understand that this is expected, of course, but still, I believe it should not be that much better?
For the sake of this conversation I tested the same XML and BIN generated with data__type FP16
with target device as GPU
, as FP16 is preferred format on the GPU (as I understand from the video). The output with device_name='GPU'
is still significantly more similar to the output of the initial model than the output with device_name='MYRIAD'
. See the image of absolute errors for GPU below:
Maximum absolute difference is now 0.10, way better than 1.0.
Any insight into why xml and bin with data type FP16 give more similar output when run on GPU than when run on MYRIAD? I expected them to be almost the same.
The conversion within OpenVINO would involve the Model Optimizer.
This Model Optimizer would generally:
In this case, getting a much better result is expected since that is the purpose of optimization.
However, if compared by the result of inferencing FP16 IR with the native format, I believe they should have the same pattern as CPU and GPU. In terms of fps I could see that there's a slight improvement for IR compared to the native format. We'll look further into this.
Hi @tersekmatija
Thank you for providing all the details to reproduce the issue. I am able to see the difference when comparing MYRIAD inference results with CPU and ONNX runtime. I have opened a bug with the development team to get their input.
I will let you know what I find out.
Regards, Jesus
Ref. 66793
Hi @jgespino , any updates?
Thanks, Matija
Hi @tersekmatija
Apologies for the delay, I don't have an update just yet. Let me follow up with the development team.
Regards, Jesus
Hi @jgespino .
I think the issue must be related to some layers in segmentation networks. For example, a similar issue appears with MODNet as well.
@tersekmatija Thank you, I'm still waiting to hear back from the development team.
Hey, @jgespino, any progress already?
We've had a similar issue as well with a model from the client. The outputs from the CPU and GPU are the same, while for Myriad it is different. I think that GroupConvolution layer might be causing the issues (just my initial guess). I compared the architectures of the other models where the outputs are different vs arhictectures where the outputs are similar (with significantly small difference), and it seems that the models with different outputs on CPU/GPU and Myriad have GroupConvolution layers. But I might be wrong here, as some models with this layer also work normally. Any update on that would be much appreciated!
@tersekmatija I have added your comments to the internal ticket with the dev team. Sorry about the delay, I'm still waiting to hear back.
@tersekmatija I have added your comments to the internal ticket with the dev team. Sorry about the delay, I'm still waiting to hear back.
Is there an internal IPS ticket for it? If yes, can you add us as followers to it?
Hi @szabi-luxonis @tersekmatija
There is currently no IPS case for this issue, the ticket opened with the development team is in a separate internal tool. Since you have IPS access, would you mind opening an IPS ticket as well?
Regards, Jesus
Hi @jgespino,
any updates on this issue?
We also ran into the same issue. The results when running the same models on MYRIAD and CPU/GPU differ so significantly that MYRIAD cannot be used.
Inside our product, we usually use INT8 Quantized models which are not supported for VPUs (see here).
Therefore we have to use unquantized 32-bit models which result in significantly different results when running on MYRIAD compared to running the same models on CPU/GPU.
Since we are aware that MYRIAD support only FP_16
(see here), we tried to convert the models beforehand to 16-bit precision using convert-script, but there again we get correct results when running such models on GPU with GPU_FP16
and wrong ones when running on MYRIAD.
@tersekmatija did you manage to solve the issue?
Hey @drux007 ,
sadly not yet.
Hey @drux007 ,
sadly not yet.
Thanks for your quick answer @tersekmatija.
Did you simply stop using the stick or stopped due to the difference in the output? What did you conclude in that case - the computation of the stick is not good enough?
Hey @drux007 ,
I just switched a model, but I'd still like to see this resolved :) I found out that this issue is rarely the case, but can happen some time. I think you could modify the scales of some layers manually by providing config files when doing inference or compiling a blob.
@andrei-kochin I see this was closed as completed, but it doesn't look like the issue has been resolved really.
@tersekmatija As per https://www.intel.com/content/www/us/en/developer/articles/release-notes/openvino-2022-3-lts-relnotes.html Myriad will be in the maintenance mode and updates are expected for 2022.3 LTS iterations.
I'll let @Maxim-Doronin to comment on that.
@AndreeaAtudorei please update the status here. Will this bug addressed in upcoming 22.3.1 ?
no action has been planned for the purpose of this ticket for OV 22.3.1 release. however, if I understand correctly Maksim's remark on the ticket, it is possible to be solved by FW upgrade so I would like to ask you to try and confirm/deny if the ticket is still valid in the context of the new 22.3.1 release where the fw is updated.
Hey, which version should solve this @AndreeaAtudorei ? I've just tried with 22.1 from here but the results are the same:
hi, my thought was to try with the version/branch that will be released in 22.3.1
Is this the correct branch?
yes
Ok, so I manually compiled the library, re-exported with new mo.py (2022.3 release branch I linked above), then done the inference on device. Doesn't seem anything is changed really - see below.
@tersekmatija do you see any noticeable difference if you load the ONNX model instead of the IR model?
What's the API to load ONNX model to MX directly? My understanding is the model gets compiled under the hood anyway? @avitial
@tersekmatija simply pass the onnx model instead of the .xml like net = ie.read_network('model_float32.onnx')
. Also, are you using the new OpenVINO 2.0 API with 2022.3? Just curious, not sure this has any effect on the results or at least it shouldn't.
I don't think API matters since the same MX plugin is used for inference? The results are the same when using ONNX like that yes. @avitial
@tersekmatija you are right, if the issues lies in the plugin it doesn't matter which path you used to get IR/nGraph representation.
@AndreeaAtudorei any thoughts on why the issue is still there? Maybe some additional steps are required?
Yes, my understanding is it's in the plugin directly, since it only happens on MX but not on CPU/GPU. I've used model optimizer from the corresponding OV version too.
unfortunately I have no idea. debugging would be useful, but unfortunately we do not have the capacity
So it will not be fixed in 2022.3 and thus ever?
So it will not be fixed in 2022.3 and thus ever?
yes
If you tweak your code (test-model_float32.py) and use the onnx model as is or if you convert the model to IR with mo --input_model model_float32.onnx --data_type FP16 --input_shape [1,144,256,3]
, there may be an improvement in the result.
// L16 outputs = ort_sess.run(None, {'input_1': image.astype(np.float32)/255{color}})
// L16 outputs = ort_sess.run(None, {'input_1': image.astype(np.float32){color}})
As ilya mentioned, this won't be fixed but hopefully the above improves your results.
Since we are doing the inference on MyriadX, this doesn't really help.
System information (version)
Detailed description
Hey, I'm having another issue with CPU and MYRIAD output being way off. I am using Selfie Segmentation Landscape model from MediaPipe and I use tflite2tensorflow to generate the ONNX model (I am attaching ONNX in the zip).
When comparing MediaPipe, ONNX, and OpenVINO IE output (CPU), it's pretty much the same with really small absolute error (< 2e-4). However, when comparing the output of OpenVINO IE Myriad output to the original one, it's way off (absolute error can be 1, so maximum possible). I am attaching the image of absolute differences between MediaPipe results and inference from bin and xml on Myriad (which you can also see in the Colab that reproduces these results):
Similar issue has also been described here: https://github.com/PINTO0309/tflite2tensorflow/issues/9
There was also a similar issue with YoloV5 Pytorch model, which was solved by calling model.half(), which set the precision to FP16, BEFORE exporting the torch model to ONNX. So I am assuming there must be something wrong with quantization to FP16, as simply setting
data_type FP16
when calling mo.py didn't work and results were bad, until model.half() was called.However, in tensorflow, it's not possible to save the model with FP16 precision, so I have to directly rely on
data_type FP16
, which is not working as it should for me. Do you have some insight into why this would happen?Steps to reproduce
I am providing the Colab which goes over all steps: https://colab.research.google.com/drive/19Ka407n6tZIs0kJn2mY5IHXPRbniZ2uk?usp=sharing
I am also attaching ZIP with onnx model and npy which contains results of inference on Myriad.
Thanks in advance!
selfie_segmentation.zip