yanghr / BSQ

BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization (ICLR 2021)
https://openreview.net/forum?id=TiXl51SCNw8
Apache License 2.0
36 stars 9 forks source link

Is this a complete model quantization process? #1

Open Ironteen opened 3 years ago

Ironteen commented 3 years ago

Hi, hanrui, I am very interested in the ideas of this paper, but I have a question as following: In general, a complete model quantization includes

  1. Prepare a pretrained model;
  2. Fuse the batch normalization into weights of current layer;
  3. Determin the bitwidth of weights and activations for each layer and quantize them;
  4. Quantization-aware training;
  5. Activation calibration with a small dataset; But I do not find the step 2 and step 5 in this repo, which have a big impact on the final accuracy. So is it a common process in mixed-precision quantization or could you provide a mixed-precision baseline program?
yanghr commented 3 years ago

Thanks for your interest in this work! You understanding is correct, currently this work only considers the quantization of the weight and activation in the CONV and FC layers, while keeping the BN layer as full precision.

The major contribution of BSQ lies in dynamically determining the the bitwidth of each layer through the training process. In practice the fusing of BN can be done before the BSQ training process, then the entire model can be trained with the same process. We will take a closer look on this issue as we extend and improve this paper into a journal submission in near future. I agree that adding the BN fusing would make this method more practical in achieving a mixed precision model that can be processed entirely with fixed point processor.