[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
1.27k
stars
150
forks
source link
Can I reproduce SmoothQuant on CPU only since I see that torch-int8 requires a GPU, and I am only interested in inference on the CPU? #73
Open
WCSY-YG opened 9 months ago