Is GELU operation quantized as well?

@Kevinpsk Hi! Thanks for your interest in our work!

We quantize the output of GELU using the QAct, the code is here.

This processing is consistent with the actual application. Considering that PTQ must guarantee the quantized inference is consistent with the original full-precision inference, we must keep the non-linear GELU layers. Thus, to minimize hardware consumption for subsequent operations (e.g., the next linear operation), quantization is usually performed on the output of GELU, and the inference of the GELU function is performed in a cpu unit, or implemented using a Look-Up-Table (LUT).

If you are interested in a purely fixed-point implementation of the inference of GELU, we recommend you to follow the work of I-BERT, and also welcome to try our optimizations for LayerNorm and Softmax!

megvii-research / FQ-ViT

Is GELU operation quantized as well? #2