Confused about Fake Quantization

In the issue, you said that quant and dequant is necessary step for quantization.

You recommended to reference the Figure 6 in the paper.

What I don't understand is that fake quantization is actually FP32 matrix multiplication as above figure according to your recommended paper. However, in your paper, you said that your model is Fully-Quantized Vision Transfomer (FQ-VIT), which seems to contradict that you actually use FP32 matrix multiplication except LayerNorm and Softmax.

Also, Fake quantization is usually used during quantization-aware training. However, your method is post-training quantization, so I don't understand why dequant is necessary step for post-training quantization. I looked at the other paper code, but there was no place to use dequant during post-training quantization.

I appreciate if you answer the question.

megvii-research / FQ-ViT

Confused about Fake Quantization #30