why is Integer-only finetuning is much more slower than fp32 finetune

kssteven418 / I-BERT

[ICML'21 Oral] I-BERT: Integer-only BERT Quantization

https://arxiv.org/abs/2101.01321

MIT License

226 stars 32 forks source link

why is Integer-only finetuning is much more slower than fp32 finetune #14

Closed renmada closed 3 years ago

renmada commented 3 years ago

Compare with fp32 finetuning , It takes about 10x more time to inference dev data during training when do Integer-only finetune to Integer-only finetuning. How can I do INT8 inference and achieve the seepup as described in paper?