Closed mnl12 closed 1 year ago
Hello, I was wondering if you have any updates on the issue, or if you need me to send the quantized model. Thanks.
@mnl12 please share the base model and quantized model in the original framework format for us to take a look. Also have you tried the Basic Quantization flow to apply quantization to the model?
Thanks for your reply, Regarding the basic quantization flow, our model can not be converted by nncf, we had already opened an issue in the link https://github.com/openvinotoolkit/nncf/issues/1570#issuecomment-1430862097 which concludes that our model can not be converted with the current version of nncf. I attached the quantised model and the original in tensorflow format as you asked, for the quant model in my computer benchmark_app -m delg_quant_model_in8.xml -shape [1,512,512,3] results in Average: 29.35 ms while the normal model is Average: 5.31 ms delg_quant_model_in8.tflite.zip
model_6_13_1.zip om/openvinotoolkit/openvino/files/12026461/delg_quant_model_in8.tflite.zip)
Hello,
I was wondering if you have any updates or potential solutions that I can implement. Thanks.
@mnl12 I suggest to take a look at the _benchmarkapp with -pc
to take a look at the performance counters and execution time per layer.
In my observations it seems the optimized quantized model ends up with additional layers (of type Reorder and Pad) not present in the optimized non-quantized model, that take additional run time (cumulative +10.75 ms longer). Also the Subgraph and Add type layers take longer to execute in the quantized model (cumulative +7.19 ms longer) vs the non-quantized model.
Note I was able to get additional FPS (92.13 vs 76.48) with latency mode (-hint latency
with _benchmarkapp) but the quantized model is still slower when compared to the non-quantized model.
Closing this, I hope previous responses were sufficient to help you proceed. Feel free to reopen and provide additional information or ask any questions related to this topic.
Hello,
I first converted the mobilenetv3 large tensorflow model with post-training quantization to full integer according to https://www.tensorflow.org/lite/performance/post_training_quantization. Then I converted it directly with the new release of Openvino 2023.0. However, the latency is much higher compared to normal model. The benchmark_app -m model.xml on CPU results in Average:40.61 ms, while the normal (not quantized) tf model openvino conversion takes around 18ms. I was wondering if you have any suggestions on what may cause the problem. Thanks. Tensorflow 2.9.1 Openvino 2023.0