sail-sg / SimLayerKV

The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
40 stars 0 forks source link

Question about Code details #3

Open icoderzqliu opened 5 days ago

icoderzqliu commented 5 days ago

In section A.2 of the paper, it states that SimLayerKV will be used in both the prefill and decode stages. However, in the code, I see:

  1. During the prefill stage, bos_weight is set to 0. In the compression process at this line, compression is not performed. I would like to ask if compression is only carried out in the decode stage, or is there an issue with the code implementation?

  2. In this line, the condition is true only when decoding the first token, allowing for subsequent compression. However, during the later decoding process, the second condition torch.tensor(bos_weights).sum() != 0 causes this check to be false, preventing further compression. Could you please clarify this situation?

Thank you for your response.

jadeCurl commented 3 days ago

Hi,

Thank you for your interest in our work.

  1. Thanks for your question regarding the identification process during the prefill stage. I have commented out the lines related to this and plan to update the documentation shortly.

  2. The unnecessary cache is discarded during the first step of decoding, eliminating the need for further discarding in subsequent operations.