@inproceedings{Wang2020APQ,
title={APQ: Joint Search for Nerwork Architecture, Pruning and Quantization Policy},
author={Tianzhe Wang and Kuan Wang and Han Cai and Ji Lin and Zhijian Liu and Song Han},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2020}
}
We release the PyTorch code for the APQ. [Paper|Video|Competition]:
- Pytorch version >= 1.0
- Python version >= 3.6
- Progress >= 1.5
- For getting new models, you'll need the NVIDIA GPU
apq
- dataset (imagenet data path)
- elastic_nn (super network builder , w/ or w/o quantization)
- modules (define the layers, w/ or w/o quantization)
- networks (define the networks, w/ or w/o quantization)
utils.py (some utility functions for elastic_nn folder)
- models (quantzation-aware predictor and once-for-all network checkpoint path)
- imagenet_codebase (training codebase for imagenet)
- lut (latency lookup table path)
- methods (methods to find the mixed-precision network)
- evolution (evolution search code)
- utils (some utility functions, including converter)
accuracy_predictor.py (construction of accuracy predictor)
latency_predictor.py (construction of latency predictor)
converter.py (encode a subnetwork in to 1-hot vector)
quant-aware.py (code for quantization-aware training)
main.py
Readme.md
For instance, if you want to test the model under exps/test folder.
Run the following command:
CUDA_VISIBLE_DEVICES=0,1,2,3 python test.py \
--exp_dir=exps/test
You will get the exact information (latency/energy) running on BitFusion platform and ImageNet Top-1 accuracy.
For instance, if you want to search a model under 12.80ms latency constraint.
Run the following command:
CUDA_VISIBLE_DEVICES=0 python search.py \
--mode=evolution \
--acc_predictor_dir=models \
--exp_name=test \
--constraint=12.80 \
--type=latency
You will get the candidate under the resource constraints (latency or energy), which is stored in exps/test folder.
For instance, if you want to quantization-aware finetuning for the model under exps/test folder.
Run the following command:
CUDA_VISIBLE_DEVICES=0,1,2,3 python quant_aware.py \
--exp_name=test
You will get a mixed-precision model under the resource constraints (latency or energy) with considerable performance.
We provide the checkpoints for our APQ reported in the paper: | Latency | Energy | BitOps | Accuracy | Model |
---|---|---|---|---|---|
6.11ms | 9.14mJ | 12.7G | 72.8% | download | |
8.45ms | 11.81mJ | 14.6G | 73.8% | download | |
8.40ms | 12.18mJ | 16.5G | 74.1% | download | |
12.17ms | 14.14mJ | 23.6G | 75.1% | download |
You can download the models and put it into exps folder to test the performance. Note that the bold item means the search under that constraint.
Once for All: Train One Network and Specialize it for Efficient Deployment (ICLR'20, code)
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware (ICLR’19)
AMC: AutoML for Model Compression and Acceleration on Mobile Devices (ECCV’18)
HAQ: Hardware-Aware Automated Quantization (CVPR’19, oral)
Defenstive Quantization: When Efficiency Meets Robustness (ICLR'19)