Closed SunCrazy closed 1 year ago
The results shown in https://intel.github.io/neural-compressor/latest/docs/source/validated_model_list.html is measured by INC v2.0, we will soon update the data measure with v2.1.
Are you get this results by INC 2.1 + ONNXRT 1.13.1?
And please also let me know the torch and torchvision version that used to export the onnx model. let me try to reproduce your results.
In our test with INC 2.1 + ONNXRT 1.13.1 it shows:
2023-04-09 18:52:30 [INFO] |********Mixed Precision Statistics*******|
2023-04-09 18:52:30 [INFO] +-------------------+-------+------+------+
2023-04-09 18:52:30 [INFO] | Op Type | Total | INT8 | FP32 |
2023-04-09 18:52:30 [INFO] +-------------------+-------+------+------+
2023-04-09 18:52:30 [INFO] | Conv | 52 | 52 | 0 |
2023-04-09 18:52:30 [INFO] | Gather | 1 | 0 | 1 |
2023-04-09 18:52:30 [INFO] | MatMul | 1 | 1 | 0 |
2023-04-09 18:52:30 [INFO] | GlobalAveragePool | 1 | 1 | 0 |
2023-04-09 18:52:30 [INFO] | Add | 11 | 11 | 0 |
2023-04-09 18:52:30 [INFO] | Reshape | 1 | 1 | 0 |
2023-04-09 18:52:30 [INFO] | Concat | 1 | 0 | 1 |
2023-04-09 18:52:30 [INFO] | Unsqueeze | 1 | 0 | 1 |
2023-04-09 18:52:30 [INFO] | QuantizeLinear | 1 | 1 | 0 |
2023-04-09 18:52:30 [INFO] | DequantizeLinear | 2 | 2 | 0 |
2023-04-09 18:52:30 [INFO] +-------------------+-------+------+------+
2023-04-09 18:52:30 [INFO] Pass quantize model elapsed time: 11602.36 ms
2023-04-09 18:59:50 [DEBUG] Best acc is 0.65492.
2023-04-09 18:59:50 [DEBUG] *** Update the best qmodel with the result (0.65492, [440.14830350875854])
2023-04-09 18:59:50 [DEBUG] *** Accuracy not meets the requirements, do not update the best qmodel.
2023-04-09 18:59:50 [INFO] Tune 1 result is: [Accuracy (int8|fp32): 0.6549|0.6689, Duration (seconds) (int8|fp32): 440.1483|584.6360], Best tune result is: [Accuracy: 0.6549, Duration (seconds): 440.1483]
2023-04-09 18:59:50 [INFO] |***********************Tune Result Statistics**********************|
2023-04-09 18:59:50 [INFO] +--------------------+-----------+---------------+------------------+
2023-04-09 18:59:50 [INFO] | Info Type | Baseline | Tune 1 result | Best tune result |
2023-04-09 18:59:50 [INFO] +--------------------+-----------+---------------+------------------+
2023-04-09 18:59:50 [INFO] | Accuracy | 0.6689 | 0.6549 | 0.6549 |
2023-04-09 18:59:50 [INFO] | Duration (seconds) | 584.6360 | 440.1483 | 440.1483 |
2023-04-09 18:59:50 [INFO] +--------------------+-----------+---------------+------------------+
Hi @chensuyue Thanks for your response! There is my envs:
>>> import neural_compressor as inc
>>> inc.__version__
'2.1'
>>> import onnxruntime
>>> onnxruntime.__version__
'1.14.1'
>>> import torch
>>> torch.__version__
'2.0.0+cu117'
>>> import torchvision
>>> torchvision.__version__
'0.15.1+cu117'
>>>
The results shown in https://intel.github.io/neural-compressor/latest/docs/source/validated_model_list.html is measured by INC v2.0, we will soon update the data measure with v2.1. Are you get this results by INC 2.1 + ONNXRT 1.13.1? And please also let me know the torch and torchvision version that used to export the onnx model. let me try to reproduce your results.
In our test with INC 2.1 + ONNXRT 1.13.1 it shows:
2023-04-09 18:52:30 [INFO] |********Mixed Precision Statistics*******| 2023-04-09 18:52:30 [INFO] +-------------------+-------+------+------+ 2023-04-09 18:52:30 [INFO] | Op Type | Total | INT8 | FP32 | 2023-04-09 18:52:30 [INFO] +-------------------+-------+------+------+ 2023-04-09 18:52:30 [INFO] | Conv | 52 | 52 | 0 | 2023-04-09 18:52:30 [INFO] | Gather | 1 | 0 | 1 | 2023-04-09 18:52:30 [INFO] | MatMul | 1 | 1 | 0 | 2023-04-09 18:52:30 [INFO] | GlobalAveragePool | 1 | 1 | 0 | 2023-04-09 18:52:30 [INFO] | Add | 11 | 11 | 0 | 2023-04-09 18:52:30 [INFO] | Reshape | 1 | 1 | 0 | 2023-04-09 18:52:30 [INFO] | Concat | 1 | 0 | 1 | 2023-04-09 18:52:30 [INFO] | Unsqueeze | 1 | 0 | 1 | 2023-04-09 18:52:30 [INFO] | QuantizeLinear | 1 | 1 | 0 | 2023-04-09 18:52:30 [INFO] | DequantizeLinear | 2 | 2 | 0 | 2023-04-09 18:52:30 [INFO] +-------------------+-------+------+------+ 2023-04-09 18:52:30 [INFO] Pass quantize model elapsed time: 11602.36 ms 2023-04-09 18:59:50 [DEBUG] Best acc is 0.65492. 2023-04-09 18:59:50 [DEBUG] *** Update the best qmodel with the result (0.65492, [440.14830350875854]) 2023-04-09 18:59:50 [DEBUG] *** Accuracy not meets the requirements, do not update the best qmodel. 2023-04-09 18:59:50 [INFO] Tune 1 result is: [Accuracy (int8|fp32): 0.6549|0.6689, Duration (seconds) (int8|fp32): 440.1483|584.6360], Best tune result is: [Accuracy: 0.6549, Duration (seconds): 440.1483] 2023-04-09 18:59:50 [INFO] |***********************Tune Result Statistics**********************| 2023-04-09 18:59:50 [INFO] +--------------------+-----------+---------------+------------------+ 2023-04-09 18:59:50 [INFO] | Info Type | Baseline | Tune 1 result | Best tune result | 2023-04-09 18:59:50 [INFO] +--------------------+-----------+---------------+------------------+ 2023-04-09 18:59:50 [INFO] | Accuracy | 0.6689 | 0.6549 | 0.6549 | 2023-04-09 18:59:50 [INFO] | Duration (seconds) | 584.6360 | 440.1483 | 440.1483 | 2023-04-09 18:59:50 [INFO] +--------------------+-----------+---------------+------------------+
Hi @chensuyue , I noticed that there are some ops(gather/Concat/Unsqueeze) which are not in my onnx model. Are you sure that you're executing the onnx mobilenetv2 model?
I convert the pytorch model into onnx mobilenet_v2 with the following codes:
import torch
import torchvision
batch_size = 1
model = torchvision.models.mobilenet_v2(pretrained=True)
x = torch.randn(batch_size, 3, 224, 224)
# Export the model
torch.onnx.export(model, # model being run
x, # model input (or a tuple for multiple inputs)
"mobilenet_v2.onnx", # where to save the model (can be a file or file-like object)
export_params=True, # store the trained parameter weights inside the model file
opset_version=11, # the ONNX version to export the model to, please ensure at least 11.
do_constant_folding=True, # whether to execute constant folding for optimization
input_names = ['input'], # the model's input names
output_names = ['output'], # the model's output names
dynamic_axes={'input' : {0 : 'batch_size'}, # variable length axes
'output' : {0 : 'batch_size'}})
Hi @chensuyue , I have tested the same case in INC2.1 + onnxruntime1.13.1 + torch1.13.0, and still get the different results with you.
In addition, to avoid problems caused by model transformation, I have tested another case in examples/onnxrt/image_recognition/onnx_model_zoo/mobilenet/quantization/ptq_static
Which use onnx model that provided by other repo, but unable to obtain correct results also.
Hi @SunCrazy, I have reproduce your results and find the root case.
I think you must export model with opset_version=11
and run quantization with --quant_format=QDQ
. So your model didn't support per-channel, you can find the warning in your log 2023-04-20 19:30:03 [WARNING] Per-channel support with QDQ format requires opset version >= 13
. And the log I send you is also model with opset_version=11
but use default quant_format.
2 solutions: 1. export model with opset_version=13
, 2. quantize with default quant_format.
It should caused by torch or torchvision version difference, my test machine convert and stored the fp32 model for a long time so it should use a quite old version. And I also try with the new version it will get the same accuracy although the op list is a little different.
What we will do is, we will update the readme, set default opset_version=13
https://github.com/intel/neural-compressor/blob/master/examples/onnxrt/image_recognition/mobilenet_v2/quantization/ptq_static/README.md#2-prepare-model, so no matter which quant_format will work as expected.
Hi @chensuyue , Even if I convert onnx model with opset 13, I still don't get the correct results.
My Envs as follows:
Python packages:
Package Version
---------------------------- ----------
absl-py 1.4.0
alembic 1.7.7
astunparse 1.6.3
bidict 0.22.1
cachetools 5.3.0
certifi 2022.12.7
cffi 1.15.1
charset-normalizer 3.0.1
click 8.1.3
cmake 3.26.3
coloredlogs 15.0.1
contextlib2 21.6.0
contourpy 1.0.7
cryptography 39.0.1
cycler 0.11.0
Deprecated 1.2.13
filelock 3.12.0
Flask 2.2.3
Flask-Cors 3.0.10
Flask-SocketIO 5.3.2
flatbuffers 23.1.21
fonttools 4.38.0
gast 0.4.0
gevent 22.10.2
gevent-websocket 0.10.1
google-auth 2.16.1
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
greenlet 2.0.2
grpcio 1.51.3
h5py 3.8.0
humanfriendly 10.0
idna 3.4
importlib-metadata 6.0.0
importlib-resources 5.12.0
itsdangerous 2.1.2
Jinja2 3.1.2
joblib 1.2.0
keras 2.11.0
kiwisolver 1.4.4
libclang 15.0.6.1
lit 16.0.1
Mako 1.2.4
Markdown 3.4.1
MarkupSafe 2.1.2
matplotlib 3.7.0
mpmath 1.3.0
networkx 3.1
neural-compressor 2.1
numpy 1.24.2
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
oauthlib 3.2.2
onnx 1.13.1
onnxruntime 1.13.1
onnxruntime-extensions 0.7.0
opencv-python 4.7.0.72
opt-einsum 3.3.0
packaging 23.0
pandas 1.5.3
Pillow 9.4.0
pip 23.0.1
prettytable 3.6.0
protobuf 3.20.3
psutil 5.9.4
py-cpuinfo 9.0.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycocotools 2.0.6
pycparser 2.21
pyparsing 3.0.9
python-dateutil 2.8.2
python-engineio 4.3.4
python-socketio 5.7.2
pytz 2022.7.1
PyYAML 6.0
requests 2.28.2
requests-oauthlib 1.3.1
rsa 4.9
schema 0.7.5
scikit-learn 1.2.1
scipy 1.10.1
setuptools 67.1.0
six 1.16.0
SQLAlchemy 1.4.27
sympy 1.11.1
tensorboard 2.11.2
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorflow 2.11.0
tensorflow-estimator 2.11.0
tensorflow-io-gcs-filesystem 0.31.0
termcolor 2.2.0
threadpoolctl 3.1.0
torch 1.13.0
torchvision 0.14.0
triton 2.0.0
typing_extensions 4.5.0
urllib3 1.26.14
wcwidth 0.2.6
Werkzeug 2.2.3
wheel 0.38.4
wrapt 1.15.0
zipp 3.15.0
zope.event 4.6
zope.interface 5.5.2
Steps to quantize and tune model:
import torch
import torchvision
batch_size = 1
model = torchvision.models.mobilenet_v2(pretrained=True)
x = torch.randn(batch_size, 3, 224, 224)
# Export the model
torch.onnx.export(model, # model being run
x, # model input (or a tuple for multiple inputs)
"mobilenet_v2_tv0.14_op13.onnx", # where to save the model (can be a file or file-like object)
export_params=True, # store the trained parameter weights inside the model file
opset_version=13, # the ONNX version to export the model to, please ensure at least 11.
do_constant_folding=True, # whether to execute constant folding for optimization
input_names = ['input'], # the model's input names
output_names = ['output'], # the model's output names
dynamic_axes={'input' : {0 : 'batch_size'}, # variable length axes
'output' : {0 : 'batch_size'}})
some log:
/mnt/ssd/chenf/software/pyenv/neural-compressor/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/mnt/ssd/chenf/software/pyenv/neural-compressor/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=MobileNet_V2_Weights.IMAGENET1K_V1`. You can also use `weights=MobileNet_V2_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
bash run_tuning.sh --input_model=mobilenet_v2_tv0.14_op13.onnx --dataset_location=/mnt/ssd/share/ILSVRC2012_img_val/ --output_model=mobilenet_v2_tv0.14_op13_qdq.onnx --label_path=val.txt --quant_format=QDQ
run log:
+ main --input_model=mobilenet_v2_tv0.14_op13.onnx --dataset_location=/mnt/ssd/share/ILSVRC2012_img_val/ --output_model=mobilenet_v2_tv0.14_op13_qdq.onnx --label_path=val.txt --quant_format=QDQ
+ init_params --input_model=mobilenet_v2_tv0.14_op13.onnx --dataset_location=/mnt/ssd/share/ILSVRC2012_img_val/ --output_model=mobilenet_v2_tv0.14_op13_qdq.onnx --label_path=val.txt --quant_format=QDQ
+ for var in "$@"
+ case $var in
++ echo --input_model=mobilenet_v2_tv0.14_op13.onnx
++ cut -f2 -d=
+ input_model=mobilenet_v2_tv0.14_op13.onnx
+ for var in "$@"
+ case $var in
++ echo --dataset_location=/mnt/ssd/share/ILSVRC2012_img_val/
++ cut -f2 -d=
+ dataset_location=/mnt/ssd/share/ILSVRC2012_img_val/
+ for var in "$@"
+ case $var in
++ echo --output_model=mobilenet_v2_tv0.14_op13_qdq.onnx
++ cut -f2 -d=
+ output_model=mobilenet_v2_tv0.14_op13_qdq.onnx
+ for var in "$@"
+ case $var in
++ echo --label_path=val.txt
++ cut -f2 -d=
+ label_path=val.txt
+ for var in "$@"
+ case $var in
++ echo --quant_format=QDQ
++ cut -f2 -d=
+ quant_format=QDQ
+ run_tuning
+ python main.py --model_path mobilenet_v2_tv0.14_op13.onnx --dataset_location /mnt/ssd/share/ILSVRC2012_img_val/ --label_path val.txt --output_model mobilenet_v2_tv0.14_op13_qdq.onnx --quant_format QDQ --tune
2023-04-25 10:58:22.764579: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-25 10:58:23.793518: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /mnt/ssd/chenf/software/pyenv/neural-compressor/lib/python3.8/site-packages/cv2/../../lib64:/mnt/ssd/chenf/software/cuda11.7/lib64:
2023-04-25 10:58:23.793649: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /mnt/ssd/chenf/software/pyenv/neural-compressor/lib/python3.8/site-packages/cv2/../../lib64:/mnt/ssd/chenf/software/cuda11.7/lib64:
2023-04-25 10:58:23.793666: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-04-25 10:58:24 [WARNING] Force convert framework model to neural_compressor model.
2023-04-25 10:58:25 [INFO] Start auto tuning.
2023-04-25 10:58:25 [WARNING] The model is automatically detected as a non-NLP model. You can use 'domain' argument in 'PostTrainingQuantConfig' to overwrite it
2023-04-25 10:58:25 [WARNING] Graph optimization level is automatically set to ENABLE_BASIC. You can use 'recipe' argument in 'PostTrainingQuantConfig'to overwrite it
2023-04-25 10:58:25 [INFO] Adaptor has 4 recipes.
2023-04-25 10:58:25 [INFO] 0 recipes specified by user.
2023-04-25 10:58:25 [INFO] 3 recipes require future tuning.
2023-04-25 10:58:25 [INFO] *** Initialize auto tuning
2023-04-25 10:58:25 [INFO] Get FP32 model baseline.
2023-04-25 11:06:25 [INFO] Save tuning history to /mnt/ssd/chenf/opensource/neural-compressor/examples/onnxrt/image_recognition/mobilenet_v2/quantization/ptq_static/nc_workspace/2023-04-25_10-58-21/./history.snapshot.
2023-04-25 11:06:25 [INFO] FP32 baseline is: [Accuracy: 0.6689, Duration (seconds): 480.2326]
2023-04-25 11:06:25 [INFO] Quantize the model with default config.
2023-04-25 11:06:32 [INFO] |********Mixed Precision Statistics*******|
2023-04-25 11:06:32 [INFO] +-------------------+-------+------+------+
2023-04-25 11:06:32 [INFO] | Op Type | Total | INT8 | FP32 |
2023-04-25 11:06:32 [INFO] +-------------------+-------+------+------+
2023-04-25 11:06:32 [INFO] | Conv | 52 | 52 | 0 |
2023-04-25 11:06:32 [INFO] | MatMul | 1 | 1 | 0 |
2023-04-25 11:06:32 [INFO] | GlobalAveragePool | 1 | 0 | 1 |
2023-04-25 11:06:32 [INFO] | QuantizeLinear | 66 | 66 | 0 |
2023-04-25 11:06:32 [INFO] | DequantizeLinear | 171 | 171 | 0 |
2023-04-25 11:06:32 [INFO] +-------------------+-------+------+------+
2023-04-25 11:06:32 [INFO] Pass quantize model elapsed time: 7174.15 ms
2023-04-25 11:14:32 [INFO] Tune 1 result is: [Accuracy (int8|fp32): 0.5866|0.6689, Duration (seconds) (int8|fp32): 479.3143|480.2326], Best tune result is: n/a
2023-04-25 11:14:32 [INFO] |***********************Tune Result Statistics**********************|
2023-04-25 11:14:32 [INFO] +--------------------+-----------+---------------+------------------+
2023-04-25 11:14:32 [INFO] | Info Type | Baseline | Tune 1 result | Best tune result |
2023-04-25 11:14:32 [INFO] +--------------------+-----------+---------------+------------------+
2023-04-25 11:14:32 [INFO] | Accuracy | 0.6689 | 0.5866 | n/a |
2023-04-25 11:14:32 [INFO] | Duration (seconds) | 480.2326 | 479.3143 | n/a |
2023-04-25 11:14:32 [INFO] +--------------------+-----------+---------------+------------------+
2023-04-25 11:14:32 [INFO] Save tuning history to /mnt/ssd/chenf/opensource/neural-compressor/examples/onnxrt/image_recognition/mobilenet_v2/quantization/ptq_static/nc_workspace/2023-04-25_10-58-21/./history.snapshot.
2023-04-25 11:14:32 [INFO] *** Start conservative tuning.
2023-04-25 11:14:32 [WARNING] The model is automatically detected as a non-NLP model. You can use 'domain' argument in 'PostTrainingQuantConfig' to overwrite it
2023-04-25 11:14:32 [WARNING] Graph optimization level is automatically set to ENABLE_BASIC. You can use 'recipe' argument in 'PostTrainingQuantConfig'to overwrite it
2023-04-25 11:14:32 [INFO] Adaptor has 4 recipes.
2023-04-25 11:14:32 [INFO] 0 recipes specified by user.
2023-04-25 11:14:32 [INFO] 3 recipes require future tuning.
2023-04-25 11:14:32 [INFO] FP32 baseline is: [Accuracy: 0.6689, Duration (seconds): 480.2326]
2023-04-25 11:14:32 [INFO] *** Try to convert op into lower precision to improve performance.
2023-04-25 11:14:32 [INFO] *** Start to convert op into int8.
2023-04-25 11:14:32 [INFO] *** Try to convert all conv ops into int8.
2023-04-25 11:14:39 [INFO] |********Mixed Precision Statistics*******|
2023-04-25 11:14:39 [INFO] +-------------------+-------+------+------+
2023-04-25 11:14:39 [INFO] | Op Type | Total | INT8 | FP32 |
2023-04-25 11:14:39 [INFO] +-------------------+-------+------+------+
2023-04-25 11:14:39 [INFO] | Conv | 52 | 52 | 0 |
2023-04-25 11:14:39 [INFO] | MatMul | 1 | 0 | 1 |
2023-04-25 11:14:39 [INFO] | Clip | 35 | 0 | 35 |
2023-04-25 11:14:39 [INFO] | GlobalAveragePool | 1 | 0 | 1 |
2023-04-25 11:14:39 [INFO] | QuantizeLinear | 97 | 97 | 0 |
2023-04-25 11:14:39 [INFO] | DequantizeLinear | 201 | 201 | 0 |
2023-04-25 11:14:39 [INFO] +-------------------+-------+------+------+
2023-04-25 11:14:39 [INFO] Pass quantize model elapsed time: 6998.1 ms
2023-04-25 11:22:46 [INFO] Tune 2 result is: [Accuracy (int8|fp32): 0.5869|0.6689, Duration (seconds) (int8|fp32): 486.7922|480.2326], Best tune result is: n/a
2023-04-25 11:22:46 [INFO] |***********************Tune Result Statistics**********************|
2023-04-25 11:22:46 [INFO] +--------------------+-----------+---------------+------------------+
2023-04-25 11:22:46 [INFO] | Info Type | Baseline | Tune 2 result | Best tune result |
2023-04-25 11:22:46 [INFO] +--------------------+-----------+---------------+------------------+
2023-04-25 11:22:46 [INFO] | Accuracy | 0.6689 | 0.5869 | n/a |
2023-04-25 11:22:46 [INFO] | Duration (seconds) | 480.2326 | 486.7922 | n/a |
2023-04-25 11:22:46 [INFO] +--------------------+-----------+---------------+------------------+
2023-04-25 11:22:46 [INFO] Save tuning history to /mnt/ssd/chenf/opensource/neural-compressor/examples/onnxrt/image_recognition/mobilenet_v2/quantization/ptq_static/nc_workspace/2023-04-25_10-58-21/./history.snapshot.
2023-04-25 11:22:46 [INFO] *** Convert all conv ops to int8 but accuracy not meet the requirements
2023-04-25 11:22:46 [INFO] ***Current result dict_items([('conv', 'fp32'), ('matmul', None), ('linear', None)])
2023-04-25 11:22:46 [INFO] *** Try to convert all matmul ops into int8.
2023-04-25 11:22:47 [INFO] |********Mixed Precision Statistics*******|
2023-04-25 11:22:47 [INFO] +-------------------+-------+------+------+
2023-04-25 11:22:47 [INFO] | Op Type | Total | INT8 | FP32 |
2023-04-25 11:22:47 [INFO] +-------------------+-------+------+------+
2023-04-25 11:22:47 [INFO] | Conv | 52 | 0 | 52 |
2023-04-25 11:22:47 [INFO] | MatMul | 1 | 1 | 0 |
2023-04-25 11:22:47 [INFO] | Clip | 35 | 0 | 35 |
2023-04-25 11:22:47 [INFO] | GlobalAveragePool | 1 | 0 | 1 |
2023-04-25 11:22:47 [INFO] | QuantizeLinear | 2 | 2 | 0 |
2023-04-25 11:22:47 [INFO] | DequantizeLinear | 3 | 3 | 0 |
2023-04-25 11:22:47 [INFO] +-------------------+-------+------+------+
2023-04-25 11:22:47 [INFO] Pass quantize model elapsed time: 1484.04 ms
2023-04-25 11:30:54 [INFO] Tune 3 result is: [Accuracy (int8|fp32): 0.6685|0.6689, Duration (seconds) (int8|fp32): 486.5853|480.2326], Best tune result is: [Accuracy: 0.6685, Duration (seconds): 486.5853]
2023-04-25 11:30:54 [INFO] |***********************Tune Result Statistics**********************|
2023-04-25 11:30:54 [INFO] +--------------------+-----------+---------------+------------------+
2023-04-25 11:30:54 [INFO] | Info Type | Baseline | Tune 3 result | Best tune result |
2023-04-25 11:30:54 [INFO] +--------------------+-----------+---------------+------------------+
2023-04-25 11:30:54 [INFO] | Accuracy | 0.6689 | 0.6685 | 0.6685 |
2023-04-25 11:30:54 [INFO] | Duration (seconds) | 480.2326 | 486.5853 | 486.5853 |
2023-04-25 11:30:54 [INFO] +--------------------+-----------+---------------+------------------+
2023-04-25 11:30:54 [INFO] Save tuning history to /mnt/ssd/chenf/opensource/neural-compressor/examples/onnxrt/image_recognition/mobilenet_v2/quantization/ptq_static/nc_workspace/2023-04-25_10-58-21/./history.snapshot.
2023-04-25 11:30:54 [INFO] *** Do not stop the tuning process, re-quantize the ops.
2023-04-25 11:30:54 [INFO] *** Convert all matmul ops to int8 and accuracy still meet the requirements
2023-04-25 11:30:54 [INFO] ***Current result dict_items([('conv', 'fp32'), ('matmul', 'int8'), ('linear', None)])
2023-04-25 11:30:54 [INFO] *** Ending tuning process due to no quantifiable op left.
2023-04-25 11:30:54 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.
2023-04-25 11:30:54 [INFO] Save deploy yaml to /mnt/ssd/chenf/opensource/neural-compressor/examples/onnxrt/image_recognition/mobilenet_v2/quantization/ptq_static/nc_workspace/2023-04-25_10-58-21/deploy.yaml
The final quantization config and model accuracy are not inconsistent with you.
Hi @SunCrazy, I couldn't reproduce your results with exactly the same config. Could you share me the fp32 model and quantized model? I want to verify your model in my env.
my local env
(onnxrt-1.13.1-3.8-clx070-8280) [tensorflow@mlt-clx070 ptq_static]$ pip list | grep torch
torch 1.13.0
torchvision 0.14.0
(onnxrt-1.13.1-3.8-clx070-8280) [tensorflow@mlt-clx070 ptq_static]$ pip list | grep onnx
onnx 1.13.1
onnxruntime 1.13.1
onnxruntime-extensions 0.7.0
(onnxrt-1.13.1-3.8-clx070-8280) [tensorflow@mlt-clx070 ptq_static]$ pip list | grep neural
neural-compressor 2.1
quantize cmd
bash run_tuning.sh --dataset_location=/tf_dataset2/datasets/imagenet/ImagenetRaw/ILSVRC2012_img_val --input_model=mobilenet_v2_13.onnx --output_model=onnxrt-mobilenet_v2_13-tune.onnx --quant_format=QDQ
result
2023-04-26 10:59:27 [INFO] *** Initialize auto tuning
2023-04-26 10:59:27 [INFO] Get FP32 model baseline.
2023-04-26 11:28:59 [INFO] Save tuning history to /home/tensorflow/suyue/neural-compressor/examples/onnxrt/image_recognition/mobilenet_v2/quantization/ptq_static/nc_workspace/2023-04-26_10-59-21/./history.snapshot.
2023-04-26 11:28:59 [INFO] FP32 baseline is: [Accuracy: 0.6689, Duration (seconds): 1771.0930]
2023-04-26 11:28:59 [INFO] Quantize the model with default config.
2023-04-26 11:29:05 [INFO] |********Mixed Precision Statistics*******|
2023-04-26 11:29:05 [INFO] +-------------------+-------+------+------+
2023-04-26 11:29:05 [INFO] | Op Type | Total | INT8 | FP32 |
2023-04-26 11:29:05 [INFO] +-------------------+-------+------+------+
2023-04-26 11:29:05 [INFO] | Conv | 52 | 52 | 0 |
2023-04-26 11:29:05 [INFO] | MatMul | 1 | 1 | 0 |
2023-04-26 11:29:05 [INFO] | GlobalAveragePool | 1 | 0 | 1 |
2023-04-26 11:29:05 [INFO] | QuantizeLinear | 66 | 66 | 0 |
2023-04-26 11:29:05 [INFO] | DequantizeLinear | 171 | 171 | 0 |
2023-04-26 11:29:05 [INFO] +-------------------+-------+------+------+
2023-04-26 11:29:05 [INFO] Pass quantize model elapsed time: 6908.29 ms
2023-04-26 11:39:53 [INFO] Tune 1 result is: [Accuracy (int8|fp32): 0.6549|0.6689, Duration (seconds) (int8|fp32): 647.3306|1771.0930], Best tune result is: [Accuracy: 0.6549, Duration (seconds): 647.3306]
2023-04-26 11:39:53 [INFO] |***********************Tune Result Statistics***********************|
2023-04-26 11:39:53 [INFO] +--------------------+------------+---------------+------------------+
2023-04-26 11:39:53 [INFO] | Info Type | Baseline | Tune 1 result | Best tune result |
2023-04-26 11:39:53 [INFO] +--------------------+------------+---------------+------------------+
2023-04-26 11:39:53 [INFO] | Accuracy | 0.6689 | 0.6549 | 0.6549 |
2023-04-26 11:39:53 [INFO] | Duration (seconds) | 1771.0930 | 647.3306 | 647.3306 |
2023-04-26 11:39:53 [INFO] +--------------------+------------+---------------+------------------+
2023-04-26 11:39:53 [INFO] Save tuning history to /home/tensorflow/suyue/neural-compressor/examples/onnxrt/image_recognition/mobilenet_v2/quantization/ptq_static/nc_workspace/2023-04-26_10-59-21/./history.snapshot.
2023-04-26 11:39:53 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.
2023-04-26 11:39:53 [INFO] Save deploy yaml to /home/tensorflow/suyue/neural-compressor/examples/onnxrt/image_recognition/mobilenet_v2/quantization/ptq_static/nc_workspace/2023-04-26_10-59-21/deploy.yaml
Another question, how many images in this package /mnt/ssd/share/ILSVRC2012_img_val
? Is that a standard val dataset with 50000 images?
Another question, how many images in this package
/mnt/ssd/share/ILSVRC2012_img_val
? Is that a standard val dataset with 50000 images?
Yes, It is standard val dataset.
It's so strange!Even if i execute the case in docker(builded with the dockfile provided by repo), i still get the wrong results. I'm not sure where the problem is anymore.
In addition, I can not give you the fp32 onnx model directly now.
converted onnx model md5: be295389a0fd682f60c8d2a9554010e7
If we use the same torchvison, it will be equal.
I will give you the fp32 onnx model later.
Sorry for the late reply, did you still work on this model?
Sorry for the late reply, did you still work on this model?
Sorry I have given up after I execute the case in docker(builded with the dockfile provided by repo). Maybe I will try again next time.
Thanks
@chensuyue I have tried the case in
examples/onnxrt/image_recognition/mobilenet_v2/quantization/ptg_static
, but can not reproduce the results shown in https://intel.github.io/neural-compressor/latest/docs/source/validated_model_list.html. The full log is shown as follows:Although the final accuracy is 0.6684, the Matmul is quantized with int8 but other ops are float32, it is not what we need actually.
612