HAQ: Hardware-Aware Automated Quantization with Mixed Precision - Githubissues

joapolarbear / dl_notes

1 stars 1 forks source link

HAQ: Hardware-Aware Automated Quantization with Mixed Precision #16

Open joapolarbear opened 4 years ago

joapolarbear commented 4 years ago

CVPR 2019

Hardware-Aware Automated Quantization (HAQ) framework for quantization strategy search, PDF

Contribution

leverages the reinforcement learning to automatically determine the quantization policy

[x] 2. take the hardware accelerator’s feedback in the design loop. Rather than relying on proxy signals such as FLOPs

and model size, we employ a hardware simulator to generate direct feedback signals (latency and energy) to the RL agent.

fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures.

Large search space

with M different neural network models, each with N layers, on H different hardware platforms, there are in total O(H × M × 8^{2N} ) possible solutions (Assuming the bitwidth is 1 to 8 for both weights and activations)

rule-based , suboptimal --> use learning-based method to search
How to get the reward

After all layers are quantized, we finetune the quantized model for one more epoch, and feed the validation accuracy after short-term retraining as the reward signal to our RL agent

Limitation

The optimization target is model accuracy, other hardware resources are used as limited computation budgets, i.e., latency, energy, and model size)
No Cost Model: direct latency and energy feedback from the hardware accelerator as resource constraints
If the current policy exceeds our resource budget (on latency, energy or model size), we will sequentially decrease the bitwidth of each layer until the constraint is finally satisfied.