mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
https://arxiv.org/abs/2211.10438
MIT License
1.1k stars 127 forks source link

general question about SmoothQuant kv-cache quantization #69

Open brisker opened 6 months ago

brisker commented 6 months ago
  1. Is kv-cache actually not used in all the LLM-evaluation tasks, since those tasks usually takes only one-step attention calculation, not like language generating process which needs a lot of kv-cache since the words need to be generated one by one?

  2. If this is true, how to evaluate the quantization performace if kv-cache needs to be quantized, if we want to quantize LLMs?(since kv-cache is actually not used in normal evaluation tasks)

Besides, how is kv-cache quantized in SmoothQuant?

Hoping to discuss with authors of SmoothQuant !