Closed blueskywwc closed 4 years ago
@kchechil
here are some benchmarks if you convert to ONNX and then to OpenVINO IR and run inference on Intel hw with OpenVINO: https://docs.openvinotoolkit.org/latest/openvino_docs_performance_benchmarks.html. For hw supporting INT8 natively (e.g. Cascade Lake or Ice Lake family), performance boost is up to 4x.
Thank you very much for your reply. I have another question: Can I switch to onnx model and then switch to ncnn? Thank you!
Like PyTorch, NNCF only supports exporting models to ONNX. ncnn's README states that it supports ONNX models, so you should be good to go. However, extended quantization functionality such as non-INT8 quantization and mixed precision quantization are currently only propagated to ONNX via OpenVINO specific, non-ONNX-standard FakeQuantize nodes, so checkpoints with non-INT8 quantization will probably not be loadable into ncnn.
If you only use INT8 quantization for compression, or no quantization at all (i.e. only sparsity or filter pruning algorithms), you can set "export_to_onnx_standard_ops": true
in the NNCF config file for the quantization algo part (as described at the bottom of https://github.com/openvinotoolkit/nncf_pytorch/blob/develop/docs/compression_algorithms/Quantization.md), and then the resulting ONNX model will have ONNX standard QuantizeLinear-DequantizeLinear nodes instead of FakeQuantize nodes to perform quantization. This configuration has better chances of being loadable into ncnn.
Thank you very much. My purpose is to accelerate the inference process of the pytorch model by int8 quantification, and then deploy it to the mobile terminal. I have tried the quantization tutorial on the pytorch official website, but I can’t convert the quantized model to onnx, so I want to find out whether the pytorch model can achieve mobile acceleration through nncf to onnx and openvino. Thank you!
Hello, how does the quantified model (int8) compare with the original model (fp32) in the acceleration of the inference process? Thank you!