Closed jhss closed 1 year ago
Because this project only simulates the process of quantitative inference, and quantitative inference in the numerical results of the same, but not really with int 8 type data calculation, plus the project did not optimize the arithmetic for hardware, so the speed did not become faster is normal.
@Ther-nullptr
Thank you for answering.
I wonder what the purpose of simulated quantization is if there is no speed increase.
the model will finally set up on a edge equipment, which will only support int calculation use TersorRT or other toolkits. but int calculation is meaningless on pytorch because the operator in pytorch doesn't support int calculation.
Then, if i convert the model in this repository to tensorRT, the inference speed can be accelerated?
yeah, but you may write some operator to adapt to your hardware.
Now I understand. Thank you very much!
Have a good day :)
- Fully Quantized ViT
- Normal ViT
I used Nvidia A10 to train the FQ-VIT.
I thought that the inference speed of quantized model is much faster than original one, but the result was opposite.
Is there any reason why the inference speed of quantized model is much slower than normal one?