Open xiMenchuiXeu opened 2 weeks ago
KV cache is the keys and values (hidden states) that are computed and stored for the subsequent generation. In the prefill (prompt processing) stage, there are no keys and values that have been computed so we can only compress the hidden states and these hidden states are later stored to be the initial KV cache.
when i read the code, i find that kv cache in the prefill stage not being compressed, the hidden states is compressed instead, i wonder why not compress kv cache, but compress hidden states