Why do different models have the same size？

Thanks for your great work! But I'm a bit puzzled. I run the smoothquant_opt_demo.ipynb and get three model size of model_FP16, model_w8a8 and model_smoothquant_w8a8 with print_model_size(model) from smoothquant_opt_real_int8_demo.ipynb. I know that you simulate INT8 inference with FP16 with fake_quant.py, so the model size should be same as modelFP16. But when I change `w.div(scales).round().mul(scales)tow.div(scales).round()infake_quant.py`, I got the similar model size. I'm confused about how 'fake_quant.py' works and how to achieve real INT8? looking forward to hearing from you soon. 🤩

mit-han-lab / smoothquant

Why do different models have the same size？ #53