quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
https://quic.github.io/aimet-pages/index.html
Other
2k stars 362 forks source link
auto-ml compression deep-learning deep-neural-networks machine-learning network-compression network-quantization open-source opensource pruning quantization

Qualcomm Innovation Center, Inc.

AIMET on GitHub Pages Documentation Install instructions Discussion Forums What's New

AI Model Efficiency Toolkit (AIMET)

AIMET is a library that provides advanced model quantization and compression techniques for trained neural network models. It provides features that have been proven to improve run-time performance of deep learning neural network models with lower compute and memory requirements and minimal impact to task accuracy.

How AIMET works

AIMET is designed to work with PyTorch, TensorFlow and ONNX models.

We also host the AIMET Model Zoo - a collection of popular neural network models optimized for 8-bit inference. We also provide recipes for users to quantize floating point models using AIMET.

Table of Contents

Quick Installation

The AIMET PyTorch GPU PyPI packages are available for environments that meet the following requirements:

Installation

apt-get install liblapacke
python3 -m pip install aimet-torch

To install other AIMET variants and versions, please follow one of the links below for instructions:

Why AIMET?

Benefits of AIMET

Please visit the AIMET on Github Pages for more details.

Supported Features

Quantization

Model Compression

Visualization

What's New

Some recently added features include

Results

AIMET can quantize an existing 32-bit floating-point model to an 8-bit fixed-point model without sacrificing much accuracy and without model fine-tuning.

DFQ

The DFQ method applied to several popular networks, such as MobileNet-v2 and ResNet-50, result in less than 0.9% loss in accuracy all the way down to 8-bit quantization, in an automated way without any training data.

Models FP32 INT8 Simulation
MobileNet v2 (top1) 71.72% 71.08%
ResNet 50 (top1) 76.05% 75.45%
DeepLab v3 (mIOU) 72.65% 71.91%


AdaRound (Adaptive Rounding)

ADAS Object Detect

For this example ADAS object detection model, which was challenging to quantize to 8-bit precision, AdaRound can recover the accuracy to within 1% of the FP32 accuracy.

Configuration mAP - Mean Average Precision
FP32 82.20%
Nearest Rounding (INT8 weights, INT8 acts) 49.85%
AdaRound (INT8 weights, INT8 acts) 81.21%
DeepLabv3 Semantic Segmentation

For some models like the DeepLabv3 semantic segmentation model, AdaRound can even quantize the model weights to 4-bit precision without a significant drop in accuracy.

Configuration mIOU - Mean intersection over union
FP32 72.94%
Nearest Rounding (INT4 weights, INT8 acts) 6.09%
AdaRound (INT4 weights, INT8 acts) 70.86%


Quantization for Recurrent Models

AIMET supports quantization simulation and quantization-aware training (QAT) for recurrent models (RNN, LSTM, GRU). Using QAT feature in AIMET, a DeepSpeech2 model with bi-directional LSTMs can be quantized to 8-bit precision with minimal drop in accuracy.

DeepSpeech2
(using bi-directional LSTMs)
Word Error Rate
FP32 9.92%
INT8 10.22%


Model Compression

AIMET can also significantly compress models. For popular models, such as Resnet-50 and Resnet-18, compression with spatial SVD plus channel pruning achieves 50% MAC (multiply-accumulate) reduction while retaining accuracy within approx. 1% of the original uncompressed model.

Models Uncompressed model 50% Compressed model
ResNet18 (top1) 69.76% 68.56%
ResNet 50 (top1) 76.05% 75.75%


Resources

Contributions

Thanks for your interest in contributing to AIMET! Please read our Contributions Page for more information on contributing features or bug fixes. We look forward to your participation!

Team

AIMET aims to be a community-driven project maintained by Qualcomm Innovation Center, Inc.

License

AIMET is licensed under the BSD 3-clause "New" or "Revised" License. Check out the LICENSE for more details.