mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
https://arxiv.org/abs/2211.10438
MIT License
1.1k stars 127 forks source link

Why do different models have the same size? #53

Open WelY1 opened 10 months ago

WelY1 commented 10 months ago

Thanks for your great work! But I'm a bit puzzled. I run the smoothquant_opt_demo.ipynb and get three model size of model_FP16, model_w8a8 and model_smoothquant_w8a8 with print_model_size(model) from smoothquant_opt_real_int8_demo.ipynb. I know that you simulate INT8 inference with FP16 with fake_quant.py, so the model size should be same as modelFP16. But when I change `w.div(scales).round().mul(scales)tow.div(scales).round()infake_quant.py`, I got the similar model size. I'm confused about how 'fake_quant.py' works and how to achieve real INT8? looking forward to hearing from you soon. 🤩