openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
7.33k stars 2.29k forks source link

[Bug]: After conversion of the INT8 quantized tflite model into the OpenVINO IR format, the model exhibits poor performance. #26777

Open cheahber opened 2 months ago

cheahber commented 2 months ago

OpenVINO Version

2024.4.0

Operating System

Other (Please specify in description)

Device used for inference

CPU

Framework

Keras (TensorFlow 2)

Model used

ResNet50

Issue description

Operating System - Ubuntu 22.04 CPU - Intel® Core™ i7-7700K CPU GPU - Mesa Intel® Arc(tm) A770 Graphics (DG2) Memory - 32 GB

I have examined the FP32 tflite model conversion process as demonstrated in this notebook: https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/tflite-to-openvino/tflite-to-openvino.ipynb

However, my tflite model is already quantized to INT8, and I aim to convert it directly into the OpenVINO IR format using the ov.convert_model() function. I anticipate that the INT8 IR format will surpass both the FP32 and FP16 models in inference speed. Contrary to expectations, the INT8 IR format model runs slower than its FP32 and FP16 counterparts.

Step-by-step reproduction

Python script for replicating the results: tflite_openvino.zip

Steps to reproduce:

  1. Install the required dependencies.
  2. Execute the provided Python script directly; the results will be displayed in the console.

Relevant log output

This is the snippet of the log output.

=== Inference Time Comparison ===
FP32: 0.007910 seconds per inference
FP16: 0.006492 seconds per inference
INT8: 0.010170 seconds per inference

Issue submission checklist

Aznie-Intel commented 2 weeks ago

Hi @cheahber, Are you using a custom model? Can you share your model files (INT8,FP16,FP32) for us to further investigate this?

cheahber commented 1 week ago

Hi @Aznie-Intel. It's in the tflite_openvino.zip attachments above. I am using ResNet50, in line 37, 38

image

Aznie-Intel commented 1 week ago

I observed the same issue from my end. Below is my result:

=== Inference Time Comparison === FP32: 0.040030 seconds per inference FP16: 0.036789 seconds per inference INT8: 0.046222 seconds per inference

Let me check with relevant team and we'll update you as soon as possible.