openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
6.83k stars 2.18k forks source link

[Performance]: The quantized full-connected network has no speed improvement #26380

Open eekarot opened 1 week ago

eekarot commented 1 week ago

OpenVINO Version

2024.2.0-15519-5c0f38f83f6-releases/2024/2

Operating System

Ubuntu 22.04 (LTS)

Device used for inference

CPU

OpenVINO installation

PyPi

Programming Language

Python

Hardware Architecture

x86 (64 bits)

Model used

full-connected network

Model quantization

Yes

Target Platform

CPU: Intel Xeon Gold 5433N

Performance issue description

network: self.fc1 = nn.Linear(64,128) self.fc2 = nn.Linear(128,128) self.fc3 = nn.Linear(128,128) self.fc4 = nn.Linear(128,64) Input shape : [1,64,64]

Convert & quantization step: input = torch.rand(1,64,64) ov_model = ov.convert_model(net, example_input=input) quant_ov_model = nncf.quantize(ov_model, quantization_dataset)

Result: fp32 : 12872fps int8 : 12843fps

Step-by-step reproduction

No response

Issue submission checklist

rkazants commented 1 week ago

@MaximProshin, @alexsu52, please take a look.

Regards, Roman