microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.27k stars 228 forks source link

LLMLingua and LongLLMLingua parameters question #76

Open XiaoFengbing opened 6 months ago

XiaoFengbing commented 6 months ago

I read the issue #7, #12 and #49, I guess the right papameters that LLMLingua uses are : prompt = compressor.compress_prompt( context=xxx, instruction=xxx, question=xxx, ratio=0.75, iterative_size=100, context_budget="*2", ) , and LongLLMLingua is prompt = compressor.compress_prompt( context=xxx, instruction=xxx, question=xxx, ratio=0.75, iterative_size=200, condition_compare=True, condition_in_question='after_condition', rank_method='longllmlingua', reorder_context='sort', dynamic_context_compression_ratio=0.3, context_budget="*2", ) I have some questions:

  1. In #7 , you said context_budget should be *1.3 or +300 in LongLLMLingua, and #12 ,you said context_budget should be +200 in LongLLMLingua, so I am confused by the setting of context_budget, and meanwhile, in the LLMLingua and LongLLMLingua papers, the context_budget seems to be *2 (in #49). So I want to know how to set context_budget in LLMLingua and LongLLMLingua.
  2. in #49, you said context_budget and token_budget_ratio can be considered part of the control coefficient parameter k. Can I think I just need to control the context_budget? Because in #7 and #12, you do not change the token_budget_ratio.
  3. What does dynamic_context_compression_ratio parameter correspond to in LongLLMLingua paper? I do not find it in the implementation details.
  4. The most important thing I want to know is whether the parameters I describes above are correct, and I want to get your LLMLingua and LongLLMLingua parameters you used actually in the paper, and i want to use TRUE parameters to run LLMLingua and LongLLMLingua experiments.

Please forgive my too long questions, because your LLMLingua and LongLLMLingua work are very interesting for me! Looking forward to your reply.

iofu728 commented 6 months ago

Hi @XiaoFengbing,

Thank you for your question. I've shared the parameters we used in LongLLMLingua in issues #7 and #12, which you can refer to.

  1. In fact, as I mentioned in issue #49, the "granular control coefficient" k in the paper equals “context_budget” plus “token_budget_ratio”. In LongLLMLingua, we mostly use “context_budget” = “+200”, whereas in LLMLingua, it’s “context_budget” = “*1.5”.
  2. You can leave “token_budget_ratio” unchanged for most tasks.
  3. The “dynamic_context_compression_ratio” corresponds to δτ in Equation(5) of the LongLLMLingua paper.
  4. If you have more questions, please feel free to ask.
XiaoFengbing commented 6 months ago

Hi, @iofu728, thanks for your response very much!

  1. I confused about context_budget after reading #7 and #12.
    image image context_budget are different number in LongLLMLingua, so i am confused. So I will set context_budget to be +200 in LongLLMLingua, thanks!

  2. I see dynamic_context_compression_ratio seems to be 0.25 by default in LongLLMLingua paper B.2 OTHER IMPLEMENTATION DETAILS. I am confused why it is set to be 0.3 or 0.4 in #7 and #12.

  3. Thanks for the LongLLMLingua parameters you shared in issues #7 and #12. I want to know the LLMLingua parameters you used—— prompt = compressor.compress_prompt( context=xxx, instruction=xxx, question=xxx, ratio=0.75, iterative_size=100, context_budget="*1.5", ) Is it right?

iofu728 commented 5 months ago

Hi @XiaoFengbing, sorry for the late reply.

  1. Yes, we stated in the paper that it's 0.25. However, for tasks like MultiDocument QA, a setting of 0.3 or 0.4 tends to perform better.
  2. That matches the parameters I use.