Reproducing 8/8/8 for ViT-Base

megvii-research / FQ-ViT

[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

Apache License 2.0

301 stars 48 forks source link

Reproducing 8/8/8 for ViT-Base #16

Closed nfrumkin closed 2 years ago

nfrumkin commented 2 years ago

Can explain how to reproduce 8/8/8 for ViT-Base? I assume that the following command is for 8/8/4:

python test_quant.py vit_base <YOUR_DATA_DIR> --quant --ptf --lis --quant-method minmax

Additionally, does Attn Bits refer to just the softmax quantization (ie. the argument BIT_TYPE_S in config.py)?

Thanks so much!

linyang-zhh commented 2 years ago

Hi, @nfrumkin You are right and you can just use

# LIS quantization for Attn, and the default bit=4
python test_quant.py vit_base <YOUR_DATA_DIR> --quant --ptf --lis --quant-method minmax

or,

# MinMax quantization for Attn, and the default bit=8
python test_quant.py vit_base <YOUR_DATA_DIR> --quant --ptf --quant-method minmax

And the AttnBit does refer to the softmax quantization, and you can set self.BIT_TYPE_S = BIT_TYPE_DICT["uint8"] on this line to reproduce 8/8/8 for LIS.

nfrumkin commented 2 years ago

Just circling back, I seem to get significant error degradation when reproducing. Do you know where the ~.8% drop could come from? I see similar things for the DeiT models.

ViT-Base W	Acts	Attn	Reproduced	Reported
8	8	8	82.540	83.31
8	8	4	82.608	82.68

DeiT-Small W	Acts	Attn	Reproduced	Reported
8	8	8	78.370	79.17
8	8	4	78.492	78.40

linyang-zhh commented 2 years ago

Hi, @nfrumkin Your reproduced results are right, and sorry for that there has some unclarity on experimental setups.

We reported the 8/8/8 with Uniform quantization on Self-Attention, instead of Log2. This is to fairly compare the performance of quantization schemes based on the Uniform inference.
As for your reproduced results, we also found that situation. This is because when using our Log2 Quantization (LIS), the choice of the #AttnBit does not affect the accuracy of the models. In other words, w/ LIS, 8/8/8 Acc. == 8/8/4 Acc., while w/ Uniform, 8/8/8 Acc. >> 8/8/4 Acc.. Thus, all your reproduced models have a similar performance (e.g. ViT-B, 8/8/8 82.5, 8/8/4 82.6).

nfrumkin commented 2 years ago

Thanks @linyang-zhh. I was able to reproduce ViT 8/8/8 using the following command: python test_quant.py vit_base <imagenet_path> --quant --ptf --quant-method minimax Note to future people: you will need to set BIT_TYPE_S=8 for 8/8/8

ViT-Base W	Acts	Attn	Reproduced	Reported
8	8	8	83.186	83.31

DeiT-Small W	Acts	Attn	Reproduced	Reported
8	8	8	79.166	79.17

Note: I'm still not exactly meeting the reported accuracy for this run: Deit-Tiny W	Acts	Attn	Reproduced	Reported
8	8	4	70.828	71.07

linyang-zhh commented 2 years ago

Great!

As for the reproduced accuracy of Deit-Tiny, because the accuracy of quantized models are related to the quality of calibration set, maybe you can consider trying a different random seed, by --seed xxx.