How to convert to int8 model?

@detectRecog

Hi, thanks for your recognition of our work. We have not provided the interface for storing models and there are two reasons for that. The first reason is that the model conversion and storage are closely related to specific hardware, and our work in that is still in progress. The second reason is that we use FakeQuantization in all places except PTS (Layerform) and LIS (Softmax) modules in order to facilitate the experiment.

Despite all this, there are other remedies. If you want to check the quantization value of weights, you can check them out at this line and save out for some layers. And if you want to check the int8 model size, because we quantize all weights to 8-bit, you can just divide the floating-point model size by 4 (same as 32 / 8).

This paper mainly deals with the extreme distribution of LayerNorm and Softmax in vision transformers and we don't find these extreme distributions on non-transformer architectures (such as CNN), so it may be not applicable for them. However, if you are concerned about the quantization for vision transformers, I believe our work will help you!

megvii-research / FQ-ViT

How to convert to int8 model? #1