Question about Code details

sail-sg / SimLayerKV

The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.

40 stars 0 forks source link

In section A.2 of the paper, it states that SimLayerKV will be used in both the prefill and decode stages. However, in the code, I see:

During the prefill stage, bos_weight is set to 0. In the compression process at this line, compression is not performed. I would like to ask if compression is only carried out in the decode stage, or is there an issue with the code implementation?
In this line, the condition is true only when decoding the first token, allowing for subsequent compression. However, during the later decoding process, the second condition torch.tensor(bos_weights).sum() != 0 causes this check to be false, preventing further compression. Could you please clarify this situation?

Thank you for your response.

sail-sg / SimLayerKV