megvii-research / FQ-ViT

[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
Apache License 2.0
301 stars 48 forks source link

How to convert to int8 model? #1

Closed detectRecog closed 2 years ago

detectRecog commented 2 years ago

Hello, very impressive work! How to trace the model and save its weights as int such as int8? I want to check the model size after quantization. Moreover, is the proposed method applicable to other non-transformer models?

linyang-zhh commented 2 years ago

@detectRecog

Hi, thanks for your recognition of our work. We have not provided the interface for storing models and there are two reasons for that. The first reason is that the model conversion and storage are closely related to specific hardware, and our work in that is still in progress. The second reason is that we use FakeQuantization in all places except PTS (Layerform) and LIS (Softmax) modules in order to facilitate the experiment.

Despite all this, there are other remedies. If you want to check the quantization value of weights, you can check them out at this line and save out for some layers. And if you want to check the int8 model size, because we quantize all weights to 8-bit, you can just divide the floating-point model size by 4 (same as 32 / 8).

This paper mainly deals with the extreme distribution of LayerNorm and Softmax in vision transformers and we don't find these extreme distributions on non-transformer architectures (such as CNN), so it may be not applicable for them. However, if you are concerned about the quantization for vision transformers, I believe our work will help you!