[Performance]: The quantized full-connected network has no speed improvement

OpenVINO Version

2024.2.0-15519-5c0f38f83f6-releases/2024/2

Operating System

Ubuntu 22.04 (LTS)

Device used for inference

CPU

OpenVINO installation

PyPi

Programming Language

Python

Hardware Architecture

x86 (64 bits)

Model used

full-connected network

Model quantization

Yes

Target Platform

CPU: Intel Xeon Gold 5433N

Performance issue description

network: self.fc1 = nn.Linear(64,128) self.fc2 = nn.Linear(128,128) self.fc3 = nn.Linear(128,128) self.fc4 = nn.Linear(128,64) Input shape : [1,64,64]

Convert & quantization step: input = torch.rand(1,64,64) ov_model = ov.convert_model(net, example_input=input) quant_ov_model = nncf.quantize(ov_model, quantization_dataset)

Result: fp32 : 12872fps int8 : 12843fps

Step-by-step reproduction

No response

Issue submission checklist

[X] I'm reporting a performance issue. It's not a question.
[X] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
[X] There is reproducer code and related data files such as images, videos, models, etc.

openvinotoolkit / openvino