mit-han-lab / lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention
https://arxiv.org/abs/2004.11886
Other
596 stars 81 forks source link

Quantization #22

Closed zilunpeng closed 3 years ago

zilunpeng commented 3 years ago

Could you share some more information on how you quantize the model? Did you use any packages for quantization?

Michaelvll commented 3 years ago

Sorry for the late reply. We did not use additional packages for quantization. For simplicity, we manually read the pytorch checkpoint of the trained model and applied the kmeans quantization to the model weight, which maps the floating point weight to 8-bit int and re-mapped it back to float for inference (with precision loss).