Closed mzkaramat closed 1 year ago
Hi, "amp" is for mixed precision which is similar to but might be slightly more expensive than pure fp16. PyTorch also has a quantization API: https://pytorch.org/docs/stable/quantization.html
thanks, it was more of a question then issue. Closing it,
Hi @hkchengrex , great work! May I ask, is it possible to optimise the inference in term of running at half precision fp16 by any chance, is the amp flag in config meant for it? And if you can give some pointers, if I want to quantise the model to maybe int8 (if that might be possible)? My motivation is to run the model at lower memory and using less GPU computation.