Can I reproduce SmoothQuant on CPU only since I see that torch-int8 requires a GPU, and I am only interested in inference on the CPU?

mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

https://arxiv.org/abs/2211.10438

MIT License

1.27k stars 150 forks source link

Open WCSY-YG opened 9 months ago