openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
6.88k stars 2.19k forks source link

Problems with POT Quantization #17389

Closed glucasol closed 1 year ago

glucasol commented 1 year ago

Hi everyone,

I am having some problems to quantize my OpenVINO model with POT.

I have an anomaly detection model trained using this Anomalib repo. I converted the trained model to OpenVINO format (.bin and .xml) with Model Optimizer MO.

The structure of the dataset used to train/test the model is:

I want to run this POT command (As mentioned here: POT ): pot -c <path_to_config_file>

I have created the .json configuration file like this example

But I need to have a config .yaml file, with is the Accuracy Checker configuration file.

So following this Accuracy Checker Guide, I need to create a config.yaml file similiar as below:

models:
- name: model_name
  launchers:
  - framework: openvino
    device: CPU
    model: path_to_model/alexnet.xml
    weights: path_to_weights/alexnet.bin
    adapter: anomaly_segmentation
  datasets:
    - name: dataset_name
      annotation: annotation.pickle
      data_source: images_folder

      preprocessing:
      - type: resize
        dst_width: 256
        dst_height: 256

      - type: normalization
        mean: imagenet

      - type: crop
        dst_width: 227
        dst_height: 227

     metrics:
     - type: accuracy

After following all these steps, I came up with some doubts: First, in launchers section, with adapter should I use, since my model is an anomaly detection model? Second, in datasets section, what do I need to specify in annotation subsection, since I do not have an annotation file? Anomalib do not need annotation file, since the train dataset only has normal images and the test dataset is separated in normal and anomalous folders.

Can You help me with these doubts?

Thanks!

Iffa-Intel commented 1 year ago

@glucasol If you don't have annotations in your dataset, it is recommended to apply model quantization with the Default Quantization method without accuracy control, using an unannotated dataset.

Quantized Model with Accuracy Control should be used only if the Default Quantization introduces a significant accuracy degradation. The reason for it not being the primary choice is its potential for performance degradation, due to some layers getting reverted to the original precision.

You need to create a Python script using an API of Post-Training Optimization Tool (POT) and implement data preparation logic and quantization pipeline.

There are 3 crucial steps: 1.Prepare data and dataset interface. 2.Select quantization parameters. 3.Define and run quantization process.

In step 1, when defining DataLoader, the annotation is set to None. You may refer to the scripts implementation section in this guide.

Here's something that I tried using Default Quantization method & a quantized model is generated:

potapi

If you are trying to create a custom annotation for your dataset, the step would be irrelevant to OpenVINO as this would relate more to managing your custom datasets, (should be done before the utilization of OpenVINO). This tutorial might help you.

AlexKoff88 commented 1 year ago

@nikita-savelyevv, can you please take a look?

alexsu52 commented 1 year ago

Hi @glucasol, OpenVINO POT will be deprecated in the next OpenVINO release https://github.com/openvinotoolkit/openvino/pull/16758 and the new Python post-training quantization API from the NNCF will be the main way to quantize the models.

NNCF has the example that demonstrates how to quantize Student-Teacher Feature Pyramid Matching (STFPM) OpenVINO model from Anomalib. I hope this example helps you with fast start.

More details about post-training quantization with NNCF here

glucasol commented 1 year ago

Hi @alexsu52 , Thanks for your response! I have successfully tested the example and it worked, but it seems that my INT8 model is worse than FP32 model. Below are the benchmark for both models:

[1/7] Save FP32 model: /home/gabrieloliveira/Documents/nncf/examples/post_training_quantization/openvino/quantize_with_accuracy_control/model_fp32.xml
Model graph (xml):   0.086 Mb
Model weights (bin): 168.425 Mb
Model size:          168.510 Mb
[2/7] Save INT8 model: /home/gabrieloliveira/Documents/nncf/examples/post_training_quantization/openvino/quantize_with_accuracy_control/model_int8.xml
Model graph (xml):   0.154 Mb
Model weights (bin): 159.309 Mb
Model size:          159.463 Mb
[3/7] Benchmark FP32 model:
[ INFO ] Count:            5337 iterations
[ INFO ] Duration:         15025.83 ms
[ INFO ] Latency:
[ INFO ]    Median:        24.13 ms
[ INFO ]    Average:       25.20 ms
[ INFO ]    Min:           21.89 ms
[ INFO ]    Max:           174.93 ms
[ INFO ] Throughput:   355.19 FPS
[4/7] Benchmark INT8 model:
[ INFO ] Count:            4104 iterations
[ INFO ] Duration:         15035.41 ms
[ INFO ] Latency:
[ INFO ]    Median:        31.73 ms
[ INFO ]    Average:       32.88 ms
[ INFO ]    Min:           29.65 ms
[ INFO ]    Max:           176.67 ms
[ INFO ] Throughput:   272.96 FPS
[5/7] Validate OpenVINO FP32 model:
Validate: dataset lenght = 100, metric value = 1.000
Accuracy @ top1: 1.000
[6/7] Validate OpenVINO INT8 model:
Validate: dataset lenght = 100, metric value = 1.000
Accuracy @ top1: 1.000
[7/7] Report:
Maximum accuracy drop:                  0.005
Accuracy drop:                          0.000
Model compression rate:                 1.057
Performance speed up (throughput mode): 0.768

Any reason for that?

Thanks!

alexsu52 commented 1 year ago

Hi @glucasol,

Taking into account the model compression rate 1.057 from your log, I can assume that only some layers of the model have been quantized. Could you share the original model or the name of the anomalib model you used? Also, could you share full log of your script run?

Thanks!

glucasol commented 1 year ago

Hi @alexsu52 ,

I tested other model and different accuracy drops to see if the results were different. Unfortunately I cannot share the model because it is too large, but you can replicate it just following the steps on Anomalib's repo. Just install the dependencies and before run, check the config.yaml file and ensure that export_mode is set to "openvino", like this:

optimization:
  export_mode: "openvino" # options: openvino, onnx

After that, just run the command: python tools/train.py

And the openvino IR model will be generated.

For the tests I have done, using the same model I changed the accuracy drop from 0.0005 to 0.0015 and 0.005, but all situation returns a Model compression rate: 3.985 and Performance speed up (throughput mode) almost equal. As far as I increase accuracy drop, It was supposed to increase the model compression rate and performance speed up right? But this doesn't happen, as you can see the log files bellow.

log_mvtec0005.txt log_mvtec0015.txt log_mvtec_005.txt

Thanks for your time!

alexsu52 commented 1 year ago

Hi @glucasol,

I could not reproduce the issue with padim model from the Anomalib's repo using nncf 2.4.0. But NNCF from the develop branch has issues and I prepared the fix that you can try https://github.com/openvinotoolkit/nncf/pull/1902. But your issue, probably, has another root. Could you run benchmark_app for quantized model to collect execution report? Also, please share your hardware configuration, OpenVINO version.

The command to collect the execution report is following:

benchmark_app -m <path to the quantized model> -report_type average_counters

benchmark_app will generate benchmark_average_counters_report.csv. Please share with me.

I highly recommend to use the latest OpenVINO version from pypi (2023.0.0).

glucasol commented 1 year ago

Hi @alexsu52 , These are my configuration:

I have cloned the branch you suggested, uploaded the OpenVINO version to latest and tested again. Below are the benchmark .csv files.

benchmark_average_counters_report.csv benchmark_report.csv

Thanks!

alexsu52 commented 1 year ago

Hi @glucasol,

Thank you for the benchmark report. It looks like the model quantized correctly. Runtime teem needs to look at.

@dmitry-gorokhov, the quantized padim model from the Anomalib's repo shows speed up 1.92x on Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz, but does not show speed up on Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz. Please, take a look.

dmitry-gorokhov commented 1 year ago

Hi @glucasol, Intel(R) Xeon(R) Gold 6150 doesn't support Intel(R) DL boost (aka VNNI). Such technology was firstly introduced in second generation of Xeon processors (code name Cascad Lake). Older generation have very limited HW capabilities for efficient 8bit inference. Intel(R) Core(TM) i9-10980XE has DL boost support that's why numbers provided by @alexsu52 shows 2x performance boost for quantized model.

avitial commented 1 year ago

Closing this, I hope previous responses were sufficient to help you proceed. Feel free to reopen to ask any questions related to this topic.