为什么运行浮点模型和量化后模型推理时间差不多

megvii-research / FQ-ViT

[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

Apache License 2.0

301 stars 48 forks source link

Closed liuxy1103 closed 1 year ago

liuxy1103 commented 1 year ago

我注释掉test_quant.py中的 model.model_quant()，运行时间还是和之前一样，这是怎么回事

linyang-zhh commented 1 year ago

We use FakeQuantization in our implement and to simulate the quantization inference in FP32.

So, the inference speed is not accelerated.