Inference speed of quantized model is lower than normal model

megvii-research / FQ-ViT

[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

Apache License 2.0

301 stars 48 forks source link

Inference speed of quantized model is lower than normal model #32

Closed jhss closed 1 year ago

jhss commented 1 year ago

- Fully Quantized ViT

fq1

- Normal ViT

normal normal2

I used Nvidia A10 to train the FQ-VIT.

I thought that the inference speed of quantized model is much faster than original one, but the result was opposite.

Is there any reason why the inference speed of quantized model is much slower than normal one?

Ther-nullptr commented 1 year ago

Because this project only simulates the process of quantitative inference, and quantitative inference in the numerical results of the same, but not really with int 8 type data calculation, plus the project did not optimize the arithmetic for hardware, so the speed did not become faster is normal.

jhss commented 1 year ago

@Ther-nullptr

Thank you for answering.

I wonder what the purpose of simulated quantization is if there is no speed increase.

Ther-nullptr commented 1 year ago

the model will finally set up on a edge equipment, which will only support int calculation use TersorRT or other toolkits. but int calculation is meaningless on pytorch because the operator in pytorch doesn't support int calculation.

jhss commented 1 year ago

Then, if i convert the model in this repository to tensorRT, the inference speed can be accelerated?

Ther-nullptr commented 1 year ago

yeah, but you may write some operator to adapt to your hardware.

jhss commented 1 year ago

Now I understand. Thank you very much!

Have a good day :)