microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.42k stars 241 forks source link

Is the code for LongLLMLingua out? #12

Closed darinkishore closed 10 months ago

darinkishore commented 10 months ago

When I look through 2 of the example, it seems like regular LLMLingua is being used. Is LongLLMLingua out yet? I only see the readme paper updates.

Sorry if I've missed anything obvious!

iofu728 commented 10 months ago

Hi @darinkishore, thank you for your interest in our project.

Indeed, you can use LongLLMLingua by setting the rank_method to longllmlingua. Here's how you can do it:

prompt  = compressor.compress_prompt(  
    context=documents,  
    instruction=instruction,  
    question=question,  
    ratio=0.75, # for 4x speedup  
    iterative_size=200,  
    condition_compare=True,  
    condition_in_question='after_condition',  
    rank_method='longllmlingua',  
    reorder_context='sort',  
    dynamic_context_compression_ratio=0.3,  
    context_budget="+200",  
)  

Most of our examples use LongLLMLingua, including those for RAG, Online Meetings, Code, and RAG using LlamaIndex.

darinkishore commented 10 months ago

Thank you so much! If this could be highlighted in the docs would probably prevent similar queries in the future. And yeah, your paper is really really exciting!

iofu728 commented 10 months ago

Thanks for your suggestion, we'll polish the READM soon.