mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
https://arxiv.org/abs/2211.10438
MIT License
1.1k stars 127 forks source link

adjust activations #80

Open muzi0111 opened 3 months ago

muzi0111 commented 3 months ago

Hello, I would like to inquire about how exactly the activation is adjusted during the inference process by multiplying the weights by a number and dividing the activations by a number. As far as I know, it is only possible to obtain the output of a particular layer, but not to adjust the activation of a specific input