mutonix / pyramidinfer

31 stars 0 forks source link

kv compress for prefill stage #5

Open xiMenchuiXeu opened 2 weeks ago

xiMenchuiXeu commented 2 weeks ago

when i read the code, i find that kv cache in the prefill stage not being compressed, the hidden states is compressed instead, i wonder why not compress kv cache, but compress hidden states

mutonix commented 2 weeks ago

KV cache is the keys and values (hidden states) that are computed and stored for the subsequent generation. In the prefill (prompt processing) stage, there are no keys and values that have been computed so we can only compress the hidden states and these hidden states are later stored to be the initial KV cache.