In the issue, you said that quant and dequant is necessary step for quantization.
You recommended to reference the Figure 6 in the paper.
What I don't understand is that fake quantization is actually FP32 matrix multiplication as above figure according to your recommended paper. However, in your paper, you said that your model is Fully-Quantized Vision Transfomer (FQ-VIT), which seems to contradict that you actually use FP32 matrix multiplication except LayerNorm and Softmax.
Also, Fake quantization is usually used during quantization-aware training. However, your method is post-training quantization, so I don't understand why dequant is necessary step for post-training quantization. I looked at the other paper code, but there was no place to use dequant during post-training quantization.
In the issue, you said that quant and dequant is necessary step for quantization.
You recommended to reference the Figure 6 in the paper.
What I don't understand is that fake quantization is actually FP32 matrix multiplication as above figure according to your recommended paper. However, in your paper, you said that your model is Fully-Quantized Vision Transfomer (FQ-VIT), which seems to contradict that you actually use FP32 matrix multiplication except LayerNorm and Softmax.
Also, Fake quantization is usually used during quantization-aware training. However, your method is post-training quantization, so I don't understand why dequant is necessary step for post-training quantization. I looked at the other paper code, but there was no place to use dequant during post-training quantization.
I appreciate if you answer the question.