megvii-research / FQ-ViT

[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
Apache License 2.0
301 stars 48 forks source link

Is GELU operation quantized as well? #2

Closed Kevinpsk closed 2 years ago

Kevinpsk commented 2 years ago

Hi there,

Thanks a lot for sharing the coding. I have a quick question, is GELU layer fully quantized as well? I see you mention that the vision transformer is fully quantized, but I was not able to find where you quantize GELU in code, nor do I find any description of it in the paper? Did I miss something? Thanks a lot for your clarification.

linyang-zhh commented 2 years ago

@Kevinpsk Hi! Thanks for your interest in our work!

We quantize the output of GELU using the QAct, the code is here.

This processing is consistent with the actual application. Considering that PTQ must guarantee the quantized inference is consistent with the original full-precision inference, we must keep the non-linear GELU layers. Thus, to minimize hardware consumption for subsequent operations (e.g., the next linear operation), quantization is usually performed on the output of GELU, and the inference of the GELU function is performed in a cpu unit, or implemented using a Look-Up-Table (LUT).

If you are interested in a purely fixed-point implementation of the inference of GELU, we recommend you to follow the work of I-BERT, and also welcome to try our optimizations for LayerNorm and Softmax!