[Question]: LongLLMLingua vs. LLMLingua2 for chatbot history compression

microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

MIT License

4.48k stars 251 forks source link

Describe the issue

Hey guys, thanks a lot for your great work on prompt compression, really amazing results!

I have a question regarding a chatbot history compression use-case, may I ask for some intuition of yours on which method might work better for it:

LongLLMLingua using the user's last query as the question, with re-ranking/ordering turned off, and treating each chat message as a separate document
Just using LLMLingua2 to compress the chat history, with the possibility to fine-tune the embedding models on a chat-based dataset

Thank you!

microsoft / LLMLingua

[Question]: LongLLMLingua vs. LLMLingua2 for chatbot history compression #113

Describe the issue