Is kv-cache actually not used in all the LLM-evaluation tasks, since those tasks usually takes only one-step attention calculation, not like language generating process which needs a lot of kv-cache since the words need to be generated one by one?
If this is true, how to evaluate the quantization performace if kv-cache needs to be quantized, if we want to quantize LLMs?(since kv-cache is actually not used in normal evaluation tasks)
Besides, how is kv-cache quantized in SmoothQuant?
Is kv-cache actually not used in all the LLM-evaluation tasks, since those tasks usually takes only one-step attention calculation, not like language generating process which needs a lot of kv-cache since the words need to be generated one by one?
If this is true, how to evaluate the quantization performace if kv-cache needs to be quantized, if we want to quantize LLMs?(since kv-cache is actually not used in normal evaluation tasks)
Besides, how is kv-cache quantized in SmoothQuant?
Hoping to discuss with authors of SmoothQuant !