Closed glucasol closed 1 year ago
@glucasol If you don't have annotations in your dataset, it is recommended to apply model quantization with the Default Quantization method without accuracy control, using an unannotated dataset.
Quantized Model with Accuracy Control should be used only if the Default Quantization introduces a significant accuracy degradation. The reason for it not being the primary choice is its potential for performance degradation, due to some layers getting reverted to the original precision.
You need to create a Python script using an API of Post-Training Optimization Tool (POT) and implement data preparation logic and quantization pipeline.
There are 3 crucial steps: 1.Prepare data and dataset interface. 2.Select quantization parameters. 3.Define and run quantization process.
In step 1, when defining DataLoader, the annotation is set to None. You may refer to the scripts implementation section in this guide.
Here's something that I tried using Default Quantization method & a quantized model is generated:
If you are trying to create a custom annotation for your dataset, the step would be irrelevant to OpenVINO as this would relate more to managing your custom datasets, (should be done before the utilization of OpenVINO). This tutorial might help you.
@nikita-savelyevv, can you please take a look?
Hi @glucasol, OpenVINO POT will be deprecated in the next OpenVINO release https://github.com/openvinotoolkit/openvino/pull/16758 and the new Python post-training quantization API from the NNCF will be the main way to quantize the models.
NNCF has the example that demonstrates how to quantize Student-Teacher Feature Pyramid Matching (STFPM) OpenVINO model from Anomalib. I hope this example helps you with fast start.
More details about post-training quantization with NNCF here
Hi @alexsu52 , Thanks for your response! I have successfully tested the example and it worked, but it seems that my INT8 model is worse than FP32 model. Below are the benchmark for both models:
[1/7] Save FP32 model: /home/gabrieloliveira/Documents/nncf/examples/post_training_quantization/openvino/quantize_with_accuracy_control/model_fp32.xml
Model graph (xml): 0.086 Mb
Model weights (bin): 168.425 Mb
Model size: 168.510 Mb
[2/7] Save INT8 model: /home/gabrieloliveira/Documents/nncf/examples/post_training_quantization/openvino/quantize_with_accuracy_control/model_int8.xml
Model graph (xml): 0.154 Mb
Model weights (bin): 159.309 Mb
Model size: 159.463 Mb
[3/7] Benchmark FP32 model:
[ INFO ] Count: 5337 iterations
[ INFO ] Duration: 15025.83 ms
[ INFO ] Latency:
[ INFO ] Median: 24.13 ms
[ INFO ] Average: 25.20 ms
[ INFO ] Min: 21.89 ms
[ INFO ] Max: 174.93 ms
[ INFO ] Throughput: 355.19 FPS
[4/7] Benchmark INT8 model:
[ INFO ] Count: 4104 iterations
[ INFO ] Duration: 15035.41 ms
[ INFO ] Latency:
[ INFO ] Median: 31.73 ms
[ INFO ] Average: 32.88 ms
[ INFO ] Min: 29.65 ms
[ INFO ] Max: 176.67 ms
[ INFO ] Throughput: 272.96 FPS
[5/7] Validate OpenVINO FP32 model:
Validate: dataset lenght = 100, metric value = 1.000
Accuracy @ top1: 1.000
[6/7] Validate OpenVINO INT8 model:
Validate: dataset lenght = 100, metric value = 1.000
Accuracy @ top1: 1.000
[7/7] Report:
Maximum accuracy drop: 0.005
Accuracy drop: 0.000
Model compression rate: 1.057
Performance speed up (throughput mode): 0.768
Any reason for that?
Thanks!
Hi @glucasol,
Taking into account the model compression rate 1.057 from your log, I can assume that only some layers of the model have been quantized. Could you share the original model or the name of the anomalib model you used? Also, could you share full log of your script run?
Thanks!
Hi @alexsu52 ,
I tested other model and different accuracy drops to see if the results were different. Unfortunately I cannot share the model because it is too large, but you can replicate it just following the steps on Anomalib's repo. Just install the dependencies and before run, check the config.yaml file and ensure that export_mode is set to "openvino", like this:
optimization:
export_mode: "openvino" # options: openvino, onnx
After that, just run the command:
python tools/train.py
And the openvino IR model will be generated.
For the tests I have done, using the same model I changed the accuracy drop from 0.0005 to 0.0015 and 0.005, but all situation returns a Model compression rate: 3.985 and Performance speed up (throughput mode) almost equal. As far as I increase accuracy drop, It was supposed to increase the model compression rate and performance speed up right? But this doesn't happen, as you can see the log files bellow.
log_mvtec0005.txt log_mvtec0015.txt log_mvtec_005.txt
Thanks for your time!
Hi @glucasol,
I could not reproduce the issue with padim model from the Anomalib's repo using nncf 2.4.0. But NNCF from the develop branch has issues and I prepared the fix that you can try https://github.com/openvinotoolkit/nncf/pull/1902. But your issue, probably, has another root. Could you run benchmark_app for quantized model to collect execution report? Also, please share your hardware configuration, OpenVINO version.
The command to collect the execution report is following:
benchmark_app -m <path to the quantized model> -report_type average_counters
benchmark_app will generate benchmark_average_counters_report.csv
. Please share with me.
I highly recommend to use the latest OpenVINO version from pypi (2023.0.0).
Hi @alexsu52 , These are my configuration:
I have cloned the branch you suggested, uploaded the OpenVINO version to latest and tested again. Below are the benchmark .csv files.
benchmark_average_counters_report.csv benchmark_report.csv
Thanks!
Hi @glucasol,
Thank you for the benchmark report. It looks like the model quantized correctly. Runtime teem needs to look at.
@dmitry-gorokhov, the quantized padim model from the Anomalib's repo shows speed up 1.92x on Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz, but does not show speed up on Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz. Please, take a look.
Hi @glucasol, Intel(R) Xeon(R) Gold 6150 doesn't support Intel(R) DL boost (aka VNNI). Such technology was firstly introduced in second generation of Xeon processors (code name Cascad Lake). Older generation have very limited HW capabilities for efficient 8bit inference. Intel(R) Core(TM) i9-10980XE has DL boost support that's why numbers provided by @alexsu52 shows 2x performance boost for quantized model.
Closing this, I hope previous responses were sufficient to help you proceed. Feel free to reopen to ask any questions related to this topic.
Hi everyone,
I am having some problems to quantize my OpenVINO model with POT.
I have an anomaly detection model trained using this Anomalib repo. I converted the trained model to OpenVINO format (.bin and .xml) with Model Optimizer MO.
The structure of the dataset used to train/test the model is:
I want to run this POT command (As mentioned here: POT ):
pot -c <path_to_config_file>
I have created the .json configuration file like this example
But I need to have a config .yaml file, with is the Accuracy Checker configuration file.
So following this Accuracy Checker Guide, I need to create a config.yaml file similiar as below:
After following all these steps, I came up with some doubts: First, in launchers section, with adapter should I use, since my model is an anomaly detection model? Second, in datasets section, what do I need to specify in annotation subsection, since I do not have an annotation file? Anomalib do not need annotation file, since the train dataset only has normal images and the test dataset is separated in normal and anomalous folders.
Can You help me with these doubts?
Thanks!