How to quantize llama3?

mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

https://arxiv.org/abs/2211.10438

MIT License

1.26k stars 150 forks source link

Open jpyo0803 opened 4 months ago

jpyo0803 commented 4 months ago

Hi,

I am wondering how to quantize llama3-8B with smoothquant.

What dataset did you use to generate activation scale?

Or do you plan to upload act_scales, model weights (to huggingface), and quantized version of source code(to github) for Llama3?

Thanks in advance!