microsoft / LLMLingua

[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.59k stars 252 forks source link

What the setting of parameters needed to reproduce the LongLLMLingua? #7

Closed czwlines closed 1 year ago

czwlines commented 1 year ago

Because the parameter's name in the paper does not match the API compress_prompt calling parameter's name. So what the setting of parameters needed to reproduce the LongLLMLingua?

The following are the parameter settings I speculate:

prompt  = compressor.compress_prompt(
    context=documents,
    instruction=instruction,
    question=question,
    ratio=0.75, # for 4x speedup
    iterative_size=200,
    condition_compare=True,
    condition_in_question='after',
    rank_method='longllmlingua',
    reorder_context='two_stage',
    dynamic_context_compression_ratio=0.25,
    context_budget="*2.0",
)
iofu728 commented 1 year ago

Hi @czwlines, most of the parameters are correct, but reorder_context should be sort, condition_in_question set to after_condition, and bigger dynamic_context_compression_ratio like 0.3 or 0.4, smaller context_budegt like *1.3 or +300. We will make our parameters open-source after the review process.