openvinotoolkit / nncf

Neural Network Compression Framework for enhanced OpenVINO™ inference
Apache License 2.0
904 stars 226 forks source link

NNCF2.5 When quantizing the model, an error occurred: "RuntimeError: Could not find the bias value of the node." #1936

Closed edition3234 closed 1 year ago

edition3234 commented 1 year ago

I have an ONNX model that contains convolutional layers but no fully connected layers. Upon inspection with Netron, I found that if a convolutional layer is not directly followed by a BatchNormalization layer, then the convolutional layer has both weights and biases. However, if a convolutional layer is directly followed by a BatchNormalization layer, it only has weights, and the BatchNormalization layer carries the bias. This is the structure of my model. I want to quantize it to int8 using NNCF. Currently, in the get_bias_value function in the nncf/quantization/algorithms/fast_bias_correction/onnx_backend.py code, I am encountering an error that says 'Could not find the bias value of the node'. Do I now have to add a bias of 0 to all convolutional layers that do not have a bias, in order to avoid this error during quantization?

The code for quantization is:

import torch
import numpy as np
import onnx
from torchvision import datasets
from torchvision import transforms

import nncf

model_path = "/home/fp32_mainbody_onnx_bs/const_shape_pp_main_body.onnx"

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
val_dataset = datasets.ImageFolder(
    root=f"/home/resize_images_400",
    transform=transforms.Compose(
        [
            transforms.Resize(640),
            transforms.ToTensor(),
            normalize,
        ]
    ),
)

val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1, shuffle=False)
model = onnx.load(model_path)

def transform_fn(data_item):
    images, _ = data_item
    scale = np.array([640 / 1800, 640 / 1800], dtype='float32').reshape(1, 2)
    return {input_img_name: images.numpy(),input_scale_name: scale}

calibration_dataset = nncf.Dataset(val_loader, transform_fn)
onnx_quantized_model = nncf.quantize(model, calibration_dataset, subset_size=400)

int8_model_path = f"/home/int8_nncf_quant/int8_main_body.onnx"
onnx.save(onnx_quantized_model, int8_model_path)

The error message is:

Traceback (most recent call last): File "nncf_quant_mainbody.py", line 40, in onnx_quantized_model = nncf.quantize(model, calibration_dataset, subset_size=400) File "/home/.local/lib/python3.8/site-packages/nncf/quantization/quantize_model.py", line 93, in quantize return quantize_impl( File "/home/.local/lib/python3.8/site-packages/nncf/telemetry/decorator.py", line 71, in wrapped retval = fn(*args, kwargs) File "/home/.local/lib/python3.8/site-packages/nncf/onnx/quantization/quantize_model.py", line 68, in quantize_impl quantized_model = quantization_algorithm.apply(model, dataset=calibration_dataset) File "/home/.local/lib/python3.8/site-packages/nncf/quantization/algorithms/algorithm.py", line 58, in apply return self._apply(model, statistic_points=None, dataset=dataset) File "/home/.local/lib/python3.8/site-packages/nncf/quantization/algorithms/post_training/algorithm.py", line 188, in _apply modified_model = algorithm.apply(modified_model, statistic_points) File "/home/.local/lib/python3.8/site-packages/nncf/quantization/algorithms/algorithm.py", line 63, in apply return self._apply(model, statistic_points) File "/home/.local/lib/python3.8/site-packages/nncf/quantization/algorithms/fast_bias_correction/algorithm.py", line 136, in _apply for node, bias_value in tqdm(list(node_and_bias_value), desc="Biases correction"): File "/home/.local/lib/python3.8/site-packages/nncf/quantization/algorithms/fast_bias_correction/algorithm.py", line 128, in (node, self._backend_entity.get_bias_value(node, nncf_graph, model)) File "/home/.local/lib/python3.8/site-packages/nncf/quantization/algorithms/fast_bias_correction/onnx_backend.py", line 86, in get_bias_value return get_bias_value(node, model) File "/home/.local/lib/python3.8/site-packages/nncf/onnx/graph/node_utils.py", line 60, in get_bias_value raise RuntimeError("Could not find the bias value of the node") RuntimeError: Could not find the bias value of the node**

l-bat commented 1 year ago

@edition3234 could you please attach const_shape_pp_main_body.onnx model to reproduce issue?

edition3234 commented 1 year ago

@l-bat I have uploaded the model to Google Drive. Here is the link: https://drive.google.com/file/d/14Xh9Uj4kEjxuWmGewGyT9b9dIHGjUp8V/view?usp=drive_link

kshpv commented 1 year ago

Hello, @edition3234! Thank you for the described problem. I could reproduce your issue with the code and model provided. I made a PR https://github.com/openvinotoolkit/nncf/pull/1940 that fixes the problem. When it is merged please use the latest version of the develop branch.

edition3234 commented 1 year ago

Hello,@kshpv I have modified the original model, added bias to the convolutional layer without bias, and ignored some error messages from the quantized convolutional layer. The problem of convolutional layer error messages seems to have been temporarily bypassed. But now the error message is: "Error: Duplicate definition-site for (p2o.out.5)." where 'p2o.out.5' is the output of a concat layer named 'p2o.Concat.4'. 'p2o.Concat.4' is given to two different convolutional layers. Now this error message seems to say that the concat layer cannot have two identical outputs, but it should be feasible. Moreover, I used the onnx.checker.check_model() function to check my modified model, and there was no error message,What could be causing this issue? Is it a problem with NNCF? I'm completely at a loss.

I have uploaded the new model to Google Drive. Here is the link: https://drive.google.com/file/d/1Mb39MGG4B91kkGEyr7Jd3JuYRqXzi7dO/view?usp=sharing

The error message is:

INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino image scale_factor WARNING:nncf:ONNX models with 10 < opset version < 13 do not support per-channel quantization. Per-tensor quantization will be applied. INFO:nncf:8 ignored nodes was found by name in the NNCFGraph INFO:nncf:34 ignored nodes was found by types in the NNCFGraph INFO:nncf:Not adding activation input quantizer for operation: 730 p2o.Conv.24 731 p2o.Relu.0

INFO:nncf:Not adding activation input quantizer for operation: 732 p2o.Conv.25 733 p2o.HardSigmoid.0

INFO:nncf:Not adding activation input quantizer for operation: 756 p2o.Conv.28 757 p2o.Relu.1

INFO:nncf:Not adding activation input quantizer for operation: 758 p2o.Conv.29 759 p2o.HardSigmoid.1

INFO:nncf:Not adding activation input quantizer for operation: 804 p2o.Concat.0 INFO:nncf:Not adding activation input quantizer for operation: 855 p2o.Concat.2 INFO:nncf:Not adding activation input quantizer for operation: 869 p2o.Concat.4 INFO:nncf:Not adding activation input quantizer for operation: 920 p2o.Concat.6 INFO:nncf:Not adding activation input quantizer for operation: 1216 p2o.Conv.74 INFO:nncf:Not adding activation input quantizer for operation: 951 p2o.Concat.8 INFO:nncf:Not adding activation input quantizer for operation: 1002 p2o.Concat.10 INFO:nncf:Not adding activation input quantizer for operation: 1033 p2o.Concat.12 INFO:nncf:Not adding activation input quantizer for operation: 1084 p2o.Concat.14 INFO:nncf:Not adding activation input quantizer for operation: 1308 p2o.Conv.83 INFO:nncf:Not adding activation input quantizer for operation: 1400 p2o.Conv.92 INFO:nncf:Not adding activation input quantizer for operation: 1492 p2o.Conv.101 Statistics collection: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.11it/s] Biases correction: 31%|████████████████████████████████ | 29/94 [01:36<03:36, 3.33s/it] Traceback (most recent call last): File "nncf_quant_mainbody.py", line 43, in onnx_quantized_model = nncf.quantize(model, calibration_dataset, fast_bias_correction=False,subset_size=4, File "/home/.local/lib/python3.8/site-packages/nncf/quantization/quantize_model.py", line 93, in quantize return quantize_impl( File "/home/.local/lib/python3.8/site-packages/nncf/telemetry/decorator.py", line 71, in wrapped retval = fn(*args, kwargs) File "/home/.local/lib/python3.8/site-packages/nncf/onnx/quantization/quantize_model.py", line 68, in quantize_impl quantized_model = quantization_algorithm.apply(model, dataset=calibration_dataset) File "/home/.local/lib/python3.8/site-packages/nncf/quantization/algorithms/algorithm.py", line 58, in apply return self._apply(model, statistic_points=None, dataset=dataset) File "/home/.local/lib/python3.8/site-packages/nncf/quantization/algorithms/post_training/algorithm.py", line 188, in _apply modified_model = algorithm.apply(modified_model, statistic_points) File "/home/.local/lib/python3.8/site-packages/nncf/quantization/algorithms/algorithm.py", line 63, in apply return self._apply(model, statistic_points) File "/home/.local/lib/python3.8/site-packages/nncf/quantization/algorithms/bias_correction/algorithm.py", line 160, in _apply bias_shift = self._compute_bias_shift(node, model_copy_subgraph, feed_dicts, statistic_points) File "/home/.local/lib/python3.8/site-packages/nncf/quantization/algorithms/bias_correction/algorithm.py", line 335, in _compute_bias_shift engine = EngineFactory.create(model) File "/home/.local/lib/python3.8/site-packages/nncf/common/factory.py", line 87, in create return ONNXEngine(model) File "/home/.local/lib/python3.8/site-packages/nncf/onnx/engine.py", line 29, in init self.sess = rt.InferenceSession(serialized_model, rt_session_options) File "/home/.local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 360, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "/home/.local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 399, in _create_inference_session sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model) onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Error: Duplicate definition-site for (p2o.out.5).

quant code:

import torch
import numpy as np
import onnx
from torchvision import datasets
from torchvision import transforms

import nncf

model_path = "/home/onnx_modifier/onnx-modifier-master/modified_onnx/modified_modified_model.onnx"

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
val_dataset = datasets.ImageFolder(
    root=f"/home/1/resize_images_400",
    transform=transforms.Compose(
        [
            transforms.Resize(640),
            transforms.ToTensor(),
            normalize,
        ]
    ),
)

val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1, shuffle=False)
model = onnx.load(model_path)

onnx.checker.check_model(model)

input_names = [input.name for input in model.graph.input]
input_img_name = input_names[0]
input_scale_name = input_names[1]
print(input_img_name)
print(input_scale_name)

def transform_fn(data_item):
    images, _ = data_item
    scale = np.array([640 / 1800, 640 / 1800], dtype='float32').reshape(1, 2)
    return {input_img_name: images.numpy(),input_scale_name: scale}

calibration_dataset = nncf.Dataset(val_loader, transform_fn)
onnx_quantized_model = nncf.quantize(model, calibration_dataset, fast_bias_correction=False,subset_size=4,
        ignored_scope=nncf.IgnoredScope(
            types=["Reshape","Slice","Concat"],  # ignore operations
            names=[
                "p2o.Conv.29",  # in the post-processing subgraph
                "p2o.Conv.28",
                "p2o.Conv.74",
                "p2o.Conv.83",
                "p2o.Conv.92",
                "p2o.Conv.101",
                "p2o.Conv.24",
                "p2o.Conv.25",
            ],
        ),
    )

int8_model_path = f"/home/1/int8_nncf_quant/int8_main_body.onnx"
onnx.save(onnx_quantized_model, int8_model_path)
edition3234 commented 1 year ago

Should I add two identity nodes between 'p2o.Concat.4' and the two convolutional layers so that each convolutional layer has a unique input?

edition3234 commented 1 year ago

@kshpv I added two Identity layers between the Concat layer and the two convolution layers connected to its output. Now, there are no more errors, but I don't understand why the size before quantization was 28.8m, and after quantization, it increased to 29m. Could there be a problem with my quantization code?

A model that can work normally: https://drive.google.com/file/d/1oe54nY8u0ePukBaBK7zGlrGVRwSK2aPh/view?usp=sharing

The model after quantization: https://drive.google.com/file/d/1SugtTDsQZUkCH7ArAo7tm5j0yLSnXXH2/view?usp=sharing

code:

import torch
import numpy as np
import onnx
from torchvision import datasets
from torchvision import transforms

import nncf

model_path = "/home/onnx_modifier/onnx-modifier-master/modified_onnx/1.onnx"

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
val_dataset = datasets.ImageFolder(
    root=f"/home/1/resize_images_400",
    transform=transforms.Compose(
        [
            transforms.Resize(640),
            transforms.ToTensor(),
            normalize,
        ]
    ),
)

val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1, shuffle=False)
model = onnx.load(model_path)

onnx.checker.check_model(model)

input_names = [input.name for input in model.graph.input]
input_img_name = input_names[0]
input_scale_name = input_names[1]
print(input_img_name)
print(input_scale_name)

def transform_fn(data_item):
    images, _ = data_item
    scale = np.array([640 / 1800, 640 / 1800], dtype='float32').reshape(1, 2)
    return {input_img_name: images.numpy(),input_scale_name: scale}

calibration_dataset = nncf.Dataset(val_loader, transform_fn)
onnx_quantized_model = nncf.quantize(model, calibration_dataset, fast_bias_correction=False,subset_size=400,
        ignored_scope=nncf.IgnoredScope(
            types=["Reshape","Slice","Concat"],  # ignore operations
            names=[
                "p2o.Conv.29",  # in the post-processing subgraph
                "p2o.Conv.28",
                "p2o.Conv.74",
                "p2o.Conv.83",
                "p2o.Conv.92",
                "p2o.Conv.101",
                "p2o.Conv.24",
                "p2o.Conv.25",
            ],
        ),
    )

int8_model_path = f"/home/1/int8_nncf_quant/int8_main_body.onnx"
onnx.save(onnx_quantized_model, int8_model_path)
kshpv commented 1 year ago

@edition3234 could you try quantization on the branch? - https://github.com/openvinotoolkit/nncf/pull/1940 It also introduces some fixes some issues with the quantization scheme. Your model after quantization does not have the optimal quantization scheme.

This branch works fine for me with the original model that you provided. https://drive.google.com/file/d/14Xh9Uj4kEjxuWmGewGyT9b9dIHGjUp8V/view?usp=drive_link

I don't understand why the size before quantization was 28.8m, and after quantization, it increased to 29m.

This is correct. We don't save weights at int8 precision, while the model is inferred at int8 precision. Therefore model size is not reduced.

edition3234 commented 1 year ago

Thank you for your help!!! With the version you released, the original model can also be quantized normally, but the accuracy has dropped a lot, to the point where it can't be used. I am considering using 'Quantizing with accuracy control' or other methods.

kshpv commented 1 year ago

Thank you for your help!!! With the version you released, the original model can also be quantized normally, but the accuracy has dropped a lot, to the point where it can't be used. I am considering using 'Quantizing with accuracy control' or other methods.

Unfortunately, quantize_with_accuracy_control is not supported for ONNX backend. What you can do instead: 1) Increase subset_size 2) Ignore some layers for quantization manually. I suggest starting to ignore layers from the end of the network. Because, often, quantization of the last layers can drop accuracy.

edition3234 commented 1 year ago

Can I convert the ONNX model to an IR model and then use quantize_with_accuracy_control?

If it's possible, I can modify the part that loads the ONNX model to use OpenVINO to load the IR model, which shouldn't be too difficult. Currently, I'm also trying other methods, but the quantization results have been consistently poor, possibly due to issues with the model structure.

kshpv commented 1 year ago

Can I convert the ONNX model to an IR model and then use quantize_with_accuracy_control?

Yes, you can. In this way, you can reuse IgnoredScopefrom IR and map it to ONNX.

If it's possible, I can modify the part that loads the ONNX model to use OpenVINO to load the IR model, which shouldn't be too difficult. Currently, I'm also trying other methods, but the quantization results have been consistently poor, possibly due to issues with the model structure.

What issues with model structure do you mean?

Note: You also can quantize the IR model and infer it through OpenVINO

edition3234 commented 1 year ago

What issues with model structure do you mean?

I have just started exploring quantization, and I suspect that certain structures may lead to poor quantization results. This is because I have seen some relatively simple structures that can be easily quantized into int8 with decent performance.

I will try to ignore the post-quantization processing and increase the amount of quantized data. Once again, thank you for your help.

kshpv commented 1 year ago

What issues with model structure do you mean?

I have just started exploring quantization, and I suspect that certain structures may lead to poor quantization results. This is because I have seen some relatively simple structures that can be easily quantized into int8 with decent performance.

My understanding is similar.

I will try to ignore the post-quantization processing and increase the amount of quantized data. Once again, thank you for your help.

I am really curious about the results that you get. Please, let me know how it goes.

edition3234 commented 1 year ago

@kshpv I tried to ignore the quantization of all the operators with parameters in the post-processing network, and then used 3000 images as quantization data. However, the quantization effect was still not good. I noticed a few issues during the quantization process and wanted to share them with you:

1.In the quantization code, I requested to ignore a total of 36 operators, but when I counted the printed log, only 32 operators were shown as being ignored for quantization. I am not sure if this is a printing bug or if there are actually 4 operators that haven't been ignored for quantization. Below are the printed logs and the operators I want to ignore in my code:

log:

WARNING:nncf:ONNX models with 10 < opset version < 13 do not support per-channel quantization. Per-tensor quantization will be applied.
INFO:nncf:36 ignored nodes was found by name in the NNCFGraph
INFO:nncf:Not adding activation input quantizer for operation: 1219 p2o.Reshape.10  #1
INFO:nncf:Not adding activation input quantizer for operation: 1223 p2o.Reshape.11  #2
INFO:nncf:Not adding activation input quantizer for operation: 1225 p2o.MatMul.0       #3
INFO:nncf:Not adding activation input quantizer for operation: 1227 p2o.Reshape.12  #4
INFO:nncf:Not adding activation input quantizer for operation: 1311 p2o.Reshape.15   #5
INFO:nncf:Not adding activation input quantizer for operation: 1315 p2o.Reshape.16   #6
INFO:nncf:Not adding activation input quantizer for operation: 1317 p2o.MatMul.2        #7
INFO:nncf:Not adding activation input quantizer for operation: 1319 p2o.Reshape.17    #8
INFO:nncf:Not adding activation input quantizer for operation: 1407 p2o.Reshape.21    #9
INFO:nncf:Not adding activation input quantizer for operation: 1409 p2o.MatMul.4         #10
INFO:nncf:Not adding activation input quantizer for operation: 1411 p2o.Reshape.22     #11
INFO:nncf:Not adding activation input quantizer for operation: 1499 p2o.Reshape.26     #12
INFO:nncf:Not adding activation input quantizer for operation: 1501 p2o.MatMul.6          #13
INFO:nncf:Not adding activation input quantizer for operation: 1503 p2o.Reshape.27     #14
INFO:nncf:Not adding activation input quantizer for operation: 1527 p2o.Reshape.29     #15
INFO:nncf:Not adding activation input quantizer for operation: 1508 p2o.Mul.192           #16
1509 p2o.Add.206                                                                    #17

INFO:nncf:Not adding activation input quantizer for operation: 1510 p2o.Add.208           #18
INFO:nncf:Not adding activation input quantizer for operation: 1512 p2o.Mul.194            #19
INFO:nncf:Not adding activation input quantizer for operation: 1516 p2o.Reshape.28      #20
INFO:nncf:Not adding activation input quantizer for operation: 1517 p2o.Div.94               #21
INFO:nncf:Not adding activation input quantizer for operation: 1521 p2o.NonMaxSuppression.0    #22
INFO:nncf:Not adding activation input quantizer for operation: 1524 p2o.Gather.0           #23
INFO:nncf:Not adding activation input quantizer for operation: 1525 p2o.Gather.2            #24
INFO:nncf:Not adding activation input quantizer for operation: 1529 p2o.Mul.196              #25
INFO:nncf:Not adding activation input quantizer for operation: 1532 p2o.Reshape.30        #26
INFO:nncf:Not adding activation input quantizer for operation: 1542 p2o.TopK.0                #27
INFO:nncf:Not adding activation input quantizer for operation: 1553 p2o.Reshape.32         #28
INFO:nncf:Not adding activation input quantizer for operation: 1545 p2o.Gather.8                #29
INFO:nncf:Not adding activation input quantizer for operation: 1555 p2o.Reshape.33          #30
INFO:nncf:Not adding activation input quantizer for operation: 1547 p2o.Gather.10              #31
INFO:nncf:Not adding activation input quantizer for operation: 1549 p2o.Reshape.31          #32

code:

onnx_quantized_model = nncf.quantize(model, calibration_dataset, fast_bias_correction=False,subset_size=3000,
        ignored_scope=nncf.IgnoredScope(
            names=[
                "p2o.Slice.1",  # in the post-processing subgraph
                "p2o.Reshape.33",
                "p2o.Reshape.32",
                "p2o.Reshape.31",
                "p2o.Concat.24",
                "p2o.Slice.0",
                "p2o.Reshape.29",
                "p2o.Reshape.30",
                "p2o.Gather.0",
                "p2o.Mul.196",
                "p2o.Gather.2",
                "p2o.NonMaxSuppression.0",
                "p2o.Reshape.28",
                "p2o.Mul.194",
                "p2o.Add.208",
                "p2o.Add.206",
                "p2o.Mul.192",
                "p2o.Reshape.12",
                "p2o.MatMul.0",
                "p2o.Reshape.11",
                "p2o.Reshape.17",
                "p2o.MatMul.2",
                "p2o.Reshape.16",
                "p2o.Reshape.22",
                "p2o.MatMul.4",
                "p2o.Reshape.21",
                "p2o.Reshape.27",
                "p2o.MatMul.6",
                "p2o.Reshape.26",
                "p2o.Reshape.10",
                "p2o.Reshape.15",
                "p2o.TopK.0",
                "p2o.ReduceMin.0",
                "p2o.Gather.8",
                "p2o.Gather.10",
                "p2o.Div.94",
            ],
        ),
    )

2.The 3000 images I chose for quantization are all from RPC product data. The actual test images I used are transparent bottle images I collected myself. The color richness of these images is much lower than that of the RPC product data, and they even look like grayscale images. Could this potentially be one of the reasons for the poor results? Are there any requirements for selecting image data for quantization?

System environment:Ubuntu20 Model: Body detection

kshpv commented 1 year ago

@edition3234, 1) It means that you added 36 nodes to be ignored for quantization. From all 36 nodes only 32 nodes the quantization algorithm wanted to quantize, remained 4 nodes were not selected for quantization. Because of that, you didn't see them in the logs.
So, the log is correct.

2) Your calibration dataset should represent the maximum spectrum of the actual data that will be used during inference. If you use data for calibration that is highly different from test data, then your results will be not accurate.

3) I noticed that your model has an ONNX opset < 13, which means the quantization can be done only in per-tensor mode. I recommend you upgrade the model opset to 13 and try quantization one more time. It could improve the results. You can take a look here - https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/post_training/ONNX.md#model-preparation

kshpv commented 1 year ago

@edition3234 do you use ONNXRuntime or OpenVINO as a runtime to infer the model?

edition3234 commented 1 year ago

do you use ONNXRuntime or OpenVINO as a runtime to infer the model?

I have used the OpenVINO runtime to load onnx model and infer the model

I don't think the impact of single-type quantization data would be that significant, and it should not be the main reason for such poor results. The main issue is that preparing additional quantization data with the same distribution as the training data is too troublesome.

After using the onnx.version_converter.convert_version() function to upgrade the model opset, the quantization effect remains the same.

kshpv commented 1 year ago

@edition3234 You don't need to collect extra dataset samples. You should use samples from the training or validation dataset. The recommended way is to use the validation part for quantization calibration.

edition3234 commented 1 year ago

The model was not trained on my computer, and I don't have the training data on my computer. data need to either download it again or ask someone for it.Moreover, the training data is a mix of many different open-source datasets, which makes selection quite troublesome.

Someone told me that the impact of single-type quantization data on model accuracy wouldn't be that significant, at least not as it is now, and the main issue may not lie in the quantization data.

It's possible that the general quantization of NNCF just doesn't work well with this model.

kshpv commented 1 year ago

But do you have access to the validation dataset? How do you validate the model after quantization?

kshpv commented 1 year ago

There is probably an issue with the NNCF or OpenVINO runtime to investigate it I would like to ask you a validation dataset to check the accuracy of the model after quantization and before

edition3234 commented 1 year ago

But do you have access to the validation dataset? How do you validate the model after quantization?

Without even calculating the mAP, it's evident that the quantized model is no longer usable. I directly input the image and examine the inference results, observing the bounding boxes and confidence scores. I noticed that the confidence scores are below 0.1, or even lower, for the quantized model. On the other hand, the non-quantized model produces bounding boxes with confidence scores of around 0.7 and 0.8.

a validation dataset to check the accuracy of the model after quantization and before

I tested 100 images of transparent bottles and searched for bounding boxes with confidence scores higher than 0.5. Prior to quantization, the non-quantized model successfully detected bounding boxes corresponding to the positions of transparent bottles in all 100 images. However, after quantization, the model failed to produce any bounding boxes with confidence scores higher than 0.1 for the 100 images.

kshpv commented 1 year ago

@edition3234 Thanks for the clarification! In the calibration dataset do you have these transparent images of bottles that you are using for validation? Can you do the following experiment: Quantize using the images on which you are going to validate the quality of the detector? In your case, I believe it should be the images with transparent bottles.

edition3234 commented 1 year ago

@kshpv

Quantize using the images on which you are going to validate the quality of the detector? In your case, I believe it should be the images with transparent bottles.

I tried using 1900 RPC product data images along with 100 transparent bottle images as quantization data, as well as just using the 100 images as quantization data. The quantized models produced results that were exactly the same as before, with none of the boxes in the 100 images having a confidence level higher than 0.1.

edition3234 commented 1 year ago

I randomly generated data and fed it into the network. I compared the outputs of the first four layers of two models before entering the post-processing stage and calculated the accuracy. The results showed a significant loss in accuracy after quantization. This is also why ignoring the post-processing part did not help. Here is the log I printed::

Target layer: conv2d_153.tmp_0
Mean Absolute Error: 1.0069830417633057
Mean Squared Error: 1.8657881021499634
Max Difference: 8.896358489990234
Cosine Similarity: 0.597861647605896
Mean Relative Difference: 734.9452018737793%

Target layer: conv2d_158.tmp_0
Mean Absolute Error: 0.8864331245422363
Mean Squared Error: 1.487022042274475
Max Difference: 8.845466613769531
Cosine Similarity: 0.7496800422668457
Mean Relative Difference: 385.8271837234497%

Target layer: conv2d_163.tmp_0
Mean Absolute Error: 0.6390593647956848
Mean Squared Error: 0.7118748426437378
Max Difference: 5.1746745109558105
Cosine Similarity: 0.890064001083374
Mean Relative Difference: 234.26504135131836%

Target layer: conv2d_168.tmp_0
Mean Absolute Error: 0.3480095863342285
Mean Squared Error: 0.24162788689136505
Max Difference: 2.8655588626861572
Cosine Similarity: 0.9655238389968872
Mean Relative Difference: 119.0281867980957%

Also, I noticed that these four inputs appear to be quite symmetrical in terms of network structure. Among them, the output of conv2d_153.tmp_0 goes through fewer convolutional layers compared to conv2d_168.tmp_0, but surprisingly, the accuracy in the end is not as good as conv2d_168.tmp_0.

edition3234 commented 1 year ago

截图_20230706153840931

The backbone of the model consists of 11 repetitions of the structure shown in the above figure(backbone name: LCNet). I have observed a significant loss in accuracy from the input data to the output of the backbone. I'm not sure if this is because the structure is particularly unique or if specific quantization optimizations need to be applied to improve the results.

kshpv commented 1 year ago

@edition3234 Did you try using QuantizationPreset.MIXED? If doensn't help you, you should use quantize_with_accuracy_control. If you are not happy with quantize_with_accuracy_control, then to use QAT in the original framework.

edition3234 commented 1 year ago

@kshpv After using QuantizationPreset.MIXED, some of the box confidence scores are now higher than 0.2. However, the quality of these boxes is not very good. There are a few images where the inferred boxes are relatively close to the original model's results, but the number of such cases is very limited. Overall, the model is still in an unusable state. The Cosine Similarity of the output from the p2o.Mul.20 layer has increased from 0.8 to 0.89(The output of p2o.Mul.20 refers to the output obtained after going through 11 iterations of the aforementioned structure).

The backbone of the model consists of 11 repetitions of the structure shown in the above figure(backbone name: LCNet). I have observed a significant loss in accuracy from the input data to the output of the backbone. I'm not sure if this is because the structure is particularly unique or if specific quantization optimizations need to be applied to improve the results.

Can this structure be further optimized in its quantization process? If this structure can undergo quantization with minimal loss in accuracy, and considering that it is used multiple times in the model's loops, then the overall quantization effect is likely to be quite good.

The backbone was designed to be CPU-friendly, so I'm not sure why there is such a significant loss in accuracy after quantization.

edition3234 commented 1 year ago

I roughly understand how NNCF's quantize_with_accuracy_control works. It seems to decide whether to stop based on whether the model's accuracy has reached the target that I set. However, it doesn't specify what special actions it would take if the accuracy consistently fails to reach the target value.

This is unlike OpenVINO’s Pot’s 'quantizing with Accuracy Control', which will choose to ignore some layers that have a significant impact on accuracy.

The current quantize_with_accuracy_control examples provided by the official documentation do not suit my needs, so I need to write my own Validation function. But personally, I feel like the results will still be the same, so I don’t really want to use NNCF's quantize_with_accuracy_control.

alexsu52 commented 1 year ago

HI @edition3234,

The NNCF's quantize_with_accuracy_control reverts most impactful operations of the model to the original precision to achieve specified accuracy drop. (https://docs.openvino.ai/2023.0/quantization_w_accuracy_control.html). POT and NNCF algorithms work the same. If you used POT, you can try to use it for your model.

In any case, @khpv has described the steps we recommend if the accuracy of the quantized model is not satisfactory:

  1. nncf.quantize(...) s the fastest and easiest way to get a quantized model with NNCF. However, it can lead to significant accuracy deviation in some cases.
  2. (Optional) You can try to tune hyperparameters. Some tricks that you can try, but it does not guarantee to get better result. 2.1 nncf.quantize(model, calibration_dataset, preset=QuantizationPreset.MIXED) 2.2 nncf.quantize(model, calibration_dataset, preset=QuantizationPreset.MIXED, fast_bias_correction=False) 2.3 nncf.quantize(model, calibration_dataset, preset=QuantizationPreset.MIXED, fast_bias_correction=False, advanced_parameters=nncf.AdvancedQuantizationParameters(activations_range_estimator_params=RangeEstimatorParametersSet.MINMAX))
  3. nncf.quantize_with_accuracy_control(...) is the advanced quantization flow that allows to apply 8-bit quantization to the model with control of accuracy metric.
  4. QAT, If you have training pipeline.

As for your case, the quantized model you provided has very small float ranges and the distribution of activations across channels is very different. I can recommend trying to use the parameters from 2.3. But to be honest, models like yours that are already optimized (backbone: mobilenet-v3 family) and don't have a lot of capacity usually don't lend themselves well to quantization. Especially in the case of overfitting, for some task. In order for us to help you quantize your model, we need you to provide us with a representative set of calibration data (300 images) to evaluate the activation/weight distribution and find the cause of the large drop in accuracy.

edition3234 commented 1 year ago

thanks @alexsu52

but nncf.quantize(model, calibration_dataset, preset=nncf.QuantizationPreset.MIXED,fast_bias_correction=False,subset_size=2000,advanced_parameters=nncf.AdvancedQuantizationParameters(activations_range_estimator_params=nncf.RangeEstimatorParametersSet.MINMAX)this code is throwing an error:module 'nncf' has no attribute 'RangeEstimatorParametersSet' as same as activations_range_estimator_params=RangeEstimatorParametersSet.MINMAX

And I randomly selected representative images. Below is the download link.

https://drive.google.com/file/d/1CWwR4BzJSstEow4hEkNX9DvO5lgtO5Ys/view?usp=sharing

fp32 model: https://drive.google.com/file/d/14Xh9Uj4kEjxuWmGewGyT9b9dIHGjUp8V/view?usp=drive_link

Here is the quantization code I have, which ignores the post-processing layer of quantization:

import torch
import numpy as np
import onnx
from onnx.version_converter import convert_version
from torchvision import datasets
from torchvision import transforms
from PIL import Image
import nncf

model_path = "/home/1/fp32_mainbody_onnx_bs/const_shape_pp_main_body.onnx"

class StoreOriginalSize:
    def __init__(self):
        self.sizes = []  # a list to store original sizes

    def __call__(self, img):
        # This transform does not change the image, it only stores its original size
        self.sizes.append(img.size)  # save original size (width, height)
        return img

store_original_size = StoreOriginalSize()

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
val_dataset = datasets.ImageFolder(
    root=f"/home/1/resize_images_2000",
    transform=transforms.Compose(
        [
            store_original_size,  # this will store original sizes before resizing
            transforms.Resize((640, 640)),
            transforms.ToTensor(),
            normalize,
        ]
    ),
)

val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1, shuffle=False)
old_model = onnx.load(model_path)

model = convert_version(old_model, target_version=13)

onnx.checker.check_model(model)

input_names = [input.name for input in model.graph.input]
input_img_name = input_names[0]
input_scale_name = input_names[1]

def transform_fn(data_item):
    images, _ = data_item
    origin_width, origin_height = store_original_size.sizes.pop(0)
    scale = np.array([640 / origin_height, 640 / origin_width], dtype='float32').reshape(1, 2)
    return {input_img_name: images.numpy(),input_scale_name: scale}

calibration_dataset = nncf.Dataset(val_loader, transform_fn)
onnx_quantized_model = nncf.quantize(model, calibration_dataset, preset=nncf.QuantizationPreset.MIXED
, fast_bias_correction=False,subset_size=492,
        ignored_scope=nncf.IgnoredScope(
            #types=["Reshape","Slice","Concat"],  # ignore operations
            names=[
                "p2o.Slice.1",  # in the post-processing subgraph
                "p2o.Reshape.33",
                "p2o.Reshape.32",
                "p2o.Reshape.31",
                "p2o.Concat.24",
                "p2o.Slice.0",
                "p2o.Reshape.29",
                "p2o.Reshape.30",
                "p2o.Gather.0",
                "p2o.Mul.196",
                "p2o.Gather.2",
                "p2o.NonMaxSuppression.0",
                "p2o.Reshape.28",
                "p2o.Mul.194",
                "p2o.Add.208",
                "p2o.Add.206",
                "p2o.Mul.192",
                "p2o.Reshape.12",
                "p2o.MatMul.0",
                "p2o.Reshape.11",
                "p2o.Reshape.17",
                "p2o.MatMul.2",
                "p2o.Reshape.16",
                "p2o.Reshape.22",
                "p2o.MatMul.4",
                "p2o.Reshape.21",
                "p2o.Reshape.27",
                "p2o.MatMul.6",
                "p2o.Reshape.26",
                "p2o.Reshape.10",
                "p2o.Reshape.15",
                "p2o.TopK.0",
                "p2o.ReduceMin.0",
                "p2o.Gather.8",
                "p2o.Gather.10",
                "p2o.Div.94",
            ],
        ),
    )

int8_model_path = f"/home/1/int8_nncf_quant/int8_main_body_new_branch_bs_1_2000.onnx"
onnx.save(onnx_quantized_model, int8_model_path)
edition3234 commented 1 year ago

models like yours that are already optimized (backbone: mobilenet-v3 family)

My ONNX model has been converted from another model (with almost no loss in accuracy). The original model's backbone structure is believed to be composed of the recurrent structure shown in the diagram below, which doesn't resemble MobileNet-V3. Perhaps this structure is indeed quite unique.

截图_20230711103847723


I have noticed that after the activation function, there are usually QuantizeLinear and DequantizeLinear modules. However, the original model's hard_swish activation function is being equivalently replaced during the conversion to ONNX. Could it be that this is the reason for the significant loss in accuracy

alexsu52 commented 1 year ago

Thanks @edition3234 for sharing the calibration dataset!

I tried to quantize the provided model using provided calibration dataset, but I used OpenVINO backend, because OpenVINO backend is more mature comparing another, for example ONNX. I used the script below to quantize the model and did not face with accuracy degradation of quantized model that you claimed. To prove it I will provide some images from your dataset which were not involved in calibration. Also I benchmarked INT8 and FP32 model using benchmark_app and got 3.49x speed-up for INT8 model in OpenVINO.

Some results: image image

import os

import numpy as np
import openvino.runtime as ov
import torch
from openvino.runtime import opset9 as opset
from openvino.tools import mo
from PIL import ImageDraw
from torchvision import datasets
from torchvision import transforms

import nncf

model_path = "path to onnx model"
dataset_path = "path to calibration dataset"

class StoreOriginalSize:
    def __init__(self):
        self.sizes = []  # a list to store original sizes

    def __call__(self, img):
        # This transform does not change the image, it only stores its original size
        self.sizes.append(img.size)  # save original size (width, height)
        return img

store_original_size = StoreOriginalSize()

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
val_dataset = datasets.ImageFolder(
    root=dataset_path,
    transform=transforms.Compose(
        [
            store_original_size,  # this will store original sizes before resizing
            transforms.Resize((640, 640)),
            transforms.ToTensor(),
            normalize,
        ]
    ),
)

# Create validation dataset
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1, shuffle=False)

# Convert ONNX model to OpenVINO model
ov_model = mo.convert_model(model_path)

# Replace unsuported squeeze operations to reshape operations
for op in ov_model.get_ops():
    op_name = op.get_friendly_name()
    if op_name in ["p2o.Squeeze.0", "p2o.Squeeze.2"]:
        inp_node = op.input(0)
        input_node_output = inp_node.get_source_output()
        reshape = opset.reshape(input_node_output, [-1], special_zero=True, name=f"reshape_{op_name}")
        output = op.output(0)
        target_inputs = output.get_target_inputs()
        for inp_node in target_inputs:
            inp_node.replace_source_output(reshape.output(0))

# Save pathced FP32 OpenVINO model
ov.serialize(ov_model, "ov_model.xml")

input_img_name = ov_model.inputs[0].get_any_name()
input_scale_name = ov_model.inputs[1].get_any_name()

# Prepare calibration daatset
def transform_fn(data_item):
    images, _ = data_item
    origin_width, origin_height = store_original_size.sizes.pop(0)
    scale = np.array([640 / origin_height, 640 / origin_width], dtype="float32").reshape(1, 2)
    return {input_img_name: images.numpy(), input_scale_name: scale}

calibration_dataset = nncf.Dataset(val_loader, transform_fn)

# Quantize OpenVINO model
quantized_model = nncf.quantize(ov_model, calibration_dataset)

# Save quantized OpenVINO model
ov.serialize(quantized_model, "ov_q_test.xml")

def validate(model, validation_dataset, threshold=0.5):
    compiled_model = ov.compile_model(model)

    results = {}
    for image_id, data_item in enumerate(validation_dataset):
        input = transform_fn(data_item)
        outputs = compiled_model(input)

        dt_res = []
        for i in range(outputs[1][0]):
            num_id, score, xmin, ymin, xmax, ymax = outputs[0][i].tolist()

            if int(num_id) < 0:
                continue
            if score < threshold:
                continue

            category_id = int(num_id)
            bbox = [xmin, ymin, xmax, ymax]
            dt_res.append({"image_id": image_id, "category_id": category_id, "bbox": bbox, "score": score})

        results[image_id] = dt_res
        if image_id % 100 == 0:
            print(f"validate: {image_id}")

    return results

def draw_save_results(image_dataset, results, output_dir, prefix):
    os.makedirs(output_dir, exist_ok=True)
    for image_id, p_image in enumerate(image_dataset):
        p_image = p_image[0]
        for res in results[image_id]:
            draw = ImageDraw.Draw(p_image)
            draw.rectangle(res["bbox"], outline="green", width=3)
            draw.text(res["bbox"], str(res["category_id"]))

        image_name = f"{prefix}_image_{image_id}.png"
        p_image.save(f"{output_dir}/{image_name}")
        if image_id % 100 == 0:
            print(f"save: {image_id}")

print("Validate FP32 model")
fp32_results = validate(ov_model, val_loader)

print("Validate INT8 model")
int8_results = validate(quantized_model, val_loader)

image_dataset = datasets.ImageFolder(root=dataset_path)
print("Draw and save FP32 results into ./fp32_results folder")
draw_save_results(image_dataset, fp32_results, "fp32_results", "fp32")
print("Draw and save INT8 results into ./int8_results folder")
draw_save_results(image_dataset, int8_results, "int8_results", "int8")

@edition3234, thanks for your issue, it helps to improve both NNCF and OpenVINO. @kshpv, could you reproduce results on ONNX backend?

edition3234 commented 1 year ago

I tried the new version of NNCF, and the performance after regular quantization is already very good! I really appreciate it.

edition3234 commented 1 year ago

I benchmarked INT8 and FP32 model using benchmark_app and got 3.49x speed-up for INT8 model in OpenVINO.

@alexsu52 I encountered a problem. Using the quantization code you provided, the size of the model bin file obtained by quantization became approximately 1/4 of the original fp32 model bin file. However, when I use the OpenVINO C++ API to infer both the pre-quantized and post-quantized IR models, whether using CPU or Intel's integrated graphics card, the inference time after multiple continuous runs is basically the same as the original FP32 model. I haven't tried using benchmark_app yet. Theoretically, it should be the same. Could you tell me where the problem might be?

I have used the OpenVINO's ov::preprocess::PrePostProcessor operation to load some preprocessing operations into the network model. Could this also have an impact?

edition3234 commented 1 year ago

benchmark_app log

./benchmark_app -m ov_q_test.xml -d CPU -hint latency

fp32: https://drive.google.com/file/d/1NUR2Be8b6hyeQTjuPylfQypQFkECWaBl/view?usp=sharing

[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[ WARNING ] Device(CPU) performance hint is set to LATENCY
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 35.92 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Network inputs:
[ INFO ]     image (node: image) : f32 / [...] / [1,3,640,640]
[ INFO ]     scale_factor (node: scale_factor) : f32 / [...] / [1,2]
[ INFO ] Network outputs:
[ INFO ]     multiclass_nms3_0.tmp_0 (node: multiclass_nms3_0.tmp_0) : f32 / [...] / [..1000,6]
[ INFO ]     multiclass_nms3_0.tmp_2 (node: multiclass_nms3_0.tmp_2) : i32 / [...] / [1]
[Step 5/11] Resizing model to match image sizes and given batch
[ WARNING ] image: layout is not set explicitly, so it is defaulted to NCHW. It is STRONGLY recommended to set layout manually to avoid further issues.
[ WARNING ] scale_factor: layout is not set explicitly. It is STRONGLY recommended to set layout manually to avoid further issues.
[Step 6/11] Configuring input of the model
[ INFO ] Model batch size: 1
[ INFO ] Network inputs:
[ INFO ]     image (node: image) : u8 / [N,C,H,W] / [1,3,640,640]
[ INFO ]     scale_factor (node: scale_factor) : f32 / [...] / [1,2]
[ INFO ] Network outputs:
[ INFO ]     multiclass_nms3_0.tmp_0 (node: multiclass_nms3_0.tmp_0) : f32 / [...] / [..1000,6]
[ INFO ]     multiclass_nms3_0.tmp_2 (node: multiclass_nms3_0.tmp_2) : i32 / [...] / [1]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 157.91 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ]   NETWORK_NAME: Model from PaddlePaddle.
[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1
[ INFO ]   NUM_STREAMS: 1
[ INFO ]   AFFINITY: CORE
[ INFO ]   INFERENCE_NUM_THREADS: 6
[ INFO ]   PERF_COUNT: NO
[ INFO ]   INFERENCE_PRECISION_HINT: f32
[ INFO ]   PERFORMANCE_HINT: LATENCY
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 0
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Test Config 0
[ INFO ] image         ([N,C,H,W], u8, [1,3,640,640], static):  random (image is expected)
[ INFO ] scale_factor  ([...], f32, [1,2], static): random (binary data is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 1 inference requests, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 65.76 ms
[Step 11/11] Dumping statistics report
[ INFO ] Count:               1440 iterations
[ INFO ] Duration:            60092.40 ms
[ INFO ] Latency:
[ INFO ]    Median:           38.85 ms
[ INFO ]    Average:          41.72 ms
[ INFO ]    Min:              37.69 ms
[ INFO ]    Max:              121.28 ms
[ INFO ] Throughput:          23.96 FPS

int8: https://drive.google.com/file/d/1AWdq_JgZvJEjzW3Xx3Pk66bRkk4lGzbP/view?usp=sharing

[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[ WARNING ] Device(CPU) performance hint is set to LATENCY
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 47.76 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Network inputs:
[ INFO ]     image (node: image) : f32 / [...] / [1,3,640,640]
[ INFO ]     scale_factor (node: scale_factor) : f32 / [...] / [1,2]
[ INFO ] Network outputs:
[ INFO ]     multiclass_nms3_0.tmp_0 (node: multiclass_nms3_0.tmp_0) : f32 / [...] / [..1000,6]
[ INFO ]     multiclass_nms3_0.tmp_2 (node: multiclass_nms3_0.tmp_2) : i32 / [...] / [1]
[Step 5/11] Resizing model to match image sizes and given batch
[ WARNING ] image: layout is not set explicitly, so it is defaulted to NCHW. It is STRONGLY recommended to set layout manually to avoid further issues.
[ WARNING ] scale_factor: layout is not set explicitly. It is STRONGLY recommended to set layout manually to avoid further issues.
[Step 6/11] Configuring input of the model
[ INFO ] Model batch size: 1
[ INFO ] Network inputs:
[ INFO ]     image (node: image) : u8 / [N,C,H,W] / [1,3,640,640]
[ INFO ]     scale_factor (node: scale_factor) : f32 / [...] / [1,2]
[ INFO ] Network outputs:
[ INFO ]     multiclass_nms3_0.tmp_0 (node: multiclass_nms3_0.tmp_0) : f32 / [...] / [..1000,6]
[ INFO ]     multiclass_nms3_0.tmp_2 (node: multiclass_nms3_0.tmp_2) : i32 / [...] / [1]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 444.33 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ]   NETWORK_NAME: Model from PaddlePaddle.
[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1
[ INFO ]   NUM_STREAMS: 1
[ INFO ]   AFFINITY: CORE
[ INFO ]   INFERENCE_NUM_THREADS: 6
[ INFO ]   PERF_COUNT: NO
[ INFO ]   INFERENCE_PRECISION_HINT: f32
[ INFO ]   PERFORMANCE_HINT: LATENCY
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 0
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Test Config 0
[ INFO ] image         ([N,C,H,W], u8, [1,3,640,640], static):  random (image is expected)
[ INFO ] scale_factor  ([...], f32, [1,2], static): random (binary data is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 1 inference requests, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 39.81 ms
[Step 11/11] Dumping statistics report
[ INFO ] Count:               1685 iterations
[ INFO ] Duration:            60045.42 ms
[ INFO ] Latency:
[ INFO ]    Median:           35.03 ms
[ INFO ]    Average:          35.63 ms
[ INFO ]    Min:              34.23 ms
[ INFO ]    Max:              60.19 ms
[ INFO ] Throughput:          28.06 FPS

Very close

alexsu52 commented 1 year ago

@edition3234, Sorry, I forgot to mention about using the latest develop OpenVINO in my previous comment.

Please install the latest openvino dev package and try re-benchmark models:

pip install openvino-dev==2023.1.0.dev20230728

benchmark_app with -d CPU -hint latency shows 2.16x speed-up for INT8 model on my desktop. I notes that speed-ups for latency mode and throughput mode are different.

edition3234 commented 1 year ago

Thank you for your reply. It's working fine now.