microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.42k stars 241 forks source link

Some questions about parameters? #49

Open XiaoFengbing opened 7 months ago

XiaoFengbing commented 7 months ago

What a GOOD work for PROMPT COMPRESSION! BUT I have some question about parameters in code or paper.

1.What is the granular control coefficient parameter 'k' from the LLMLingua paper in this code? I didn't find this parameter from LLMLingua paper in this code, I guess 'context_budget' (default is '+100') and this parameter have the same meaning. Is it true?

2.By the way, I didn't find the pre-defined compression rates for instruction and question from the LLMLingua paper in this code, too.

3.In this code, 'token_budget_ratio' is a parameter for 'Budget ratio for sentence-level Prompt Compression' (default is 1.4). But I do not find this parameter in the LLMLingua paper —— 1.4 or token_budget_ratio.

Thank YOU!

XiaoFengbing commented 7 months ago

Another question: In the LLMLingua and LongLLMLingua code, instruction and question seem to not be compressed.
In other word, LLMLingua and LongLLMLingua only compress context/document/demonstration, do not compress instruction and question, is it true?

iofu728 commented 7 months ago

Hi @XiaoFengbing, thank you for your interest in LLMLingua. I'll briefly answer your question:

  1. You can consider the control coefficient parameter 'k' defined in the paper as equivalent to 'context_budget' = “*k”. However, 'token_budget_ratio' can also be viewed as a control coefficient parameter. The aim of both is to control the specific compression ratios during the coarse-to-fine-grained stages.
  2. In this implementation, for simplicity, we set the compression ratios for both instruction and question to 0, meaning no compression, as we assume these parts are usually more sensitive and shorter, and retaining them has a significant impact on performance. We will incorporate the logic set in the paper into our current library soon, allowing users to customize the predefined compression ratios.
  3. “token_budget_ratio” is used to control the target compression ratio at the sentence level. It can be considered part of the control coefficient parameter.
XiaoFengbing commented 7 months ago

Thanks for your response very much!!! Looking forward to updating your code soon for customizing the predefined compression ratios, I want to faithfully reproduce the LLMLingua and LongLLMLingua results according to experiments settings in the original paper.

XiaoFengbing commented 7 months ago

@iofu728 If I want to compress instruction and question according the predefined compression ratios in LLMLingua and LongLLMLingua papers, and do not make any changes to the existing code.

Can I set context to '[instruction text]+[context text]+[question text]' for compressing instruction (ratio is set to 0.15), and set context to '[instruction text]+[context text]+[question text]' for compressing question (ratio is set to 0.1), and set context to '[instruction text]+[context text]+[question text]' for compressing context (ratio is set to any value)? Finally, the compressed instruction, context and question are obtained from three steps.

I think 1. This method can compress contexts (instruction is the condition), as described by the ITPC algorithm; 2. This method can match the true meaning of ITPC algorithm description without modifying the code, and can compress instruction and question according to the predefined ratios.

Is this method right? I look forward to your reponse. Thank you very much.

iofu728 commented 7 months ago

Hi @XiaoFengbing,

I believe it can be approximately achieved, although there might be some differences due to the context-induced condition distribution affecting the question. However, I think the impact will not be too significant.