Open cheahber opened 2 months ago
Hi @cheahber, Are you using a custom model? Can you share your model files (INT8,FP16,FP32) for us to further investigate this?
Hi @Aznie-Intel. It's in the tflite_openvino.zip attachments above. I am using ResNet50, in line 37, 38
I observed the same issue from my end. Below is my result:
=== Inference Time Comparison === FP32: 0.040030 seconds per inference FP16: 0.036789 seconds per inference INT8: 0.046222 seconds per inference
Let me check with relevant team and we'll update you as soon as possible.
OpenVINO Version
2024.4.0
Operating System
Other (Please specify in description)
Device used for inference
CPU
Framework
Keras (TensorFlow 2)
Model used
ResNet50
Issue description
Operating System - Ubuntu 22.04 CPU - Intel® Core™ i7-7700K CPU GPU - Mesa Intel® Arc(tm) A770 Graphics (DG2) Memory - 32 GB
I have examined the FP32 tflite model conversion process as demonstrated in this notebook: https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/tflite-to-openvino/tflite-to-openvino.ipynb
However, my tflite model is already quantized to INT8, and I aim to convert it directly into the OpenVINO IR format using the ov.convert_model() function. I anticipate that the INT8 IR format will surpass both the FP32 and FP16 models in inference speed. Contrary to expectations, the INT8 IR format model runs slower than its FP32 and FP16 counterparts.
Step-by-step reproduction
Python script for replicating the results: tflite_openvino.zip
Steps to reproduce:
Relevant log output
Issue submission checklist