Hardware-Aware Automated Quantization (HAQ) framework for quantization strategy search, PDF
Contribution
leverages the reinforcement learning to automatically determine the quantization policy
[x] 2. take the hardware accelerator’s feedback in the design loop. Rather than relying on proxy signals such as FLOPs
and model size, we employ a hardware simulator to generate direct feedback signals (latency and energy) to the RL
agent.
fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures.
Large search space
with M different neural network models, each with N layers, on H different hardware platforms, there are in total
O(H × M × 8^{2N} ) possible solutions (Assuming the bitwidth is 1 to 8 for both weights and activations)
rule-based , suboptimal --> use learning-based method to search
How to get the reward
After all layers are quantized, we finetune the quantized model for one more epoch, and feed the validation accuracy after short-term retraining as the reward signal to our RL agent
Limitation
The optimization target is model accuracy, other hardware resources are used as limited computation budgets, i.e., latency, energy, and model size)
No Cost Model: direct latency and energy feedback from the hardware accelerator as resource constraints
If the current policy exceeds our resource budget (on latency, energy or model size), we will sequentially decrease the bitwidth of each layer until the constraint is finally satisfied.
CVPR 2019
Hardware-Aware Automated Quantization (HAQ) framework for quantization strategy search, PDF
Contribution
and model size, we employ a hardware simulator to generate direct feedback signals (latency and energy) to the RL agent.
Large search space
How to get the reward
Limitation