[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
Very appreciate your awesome work and efforts for the easy-to-use code.
The provided example uses OpenAI's GPT3.5 with the OpenAI API. Is there a plan to provide the evaluation script using longchat-13b-16k to reproduce LongLLMLingua?
Very appreciate your awesome work and efforts for the easy-to-use code.
The provided example uses OpenAI's GPT3.5 with the OpenAI API. Is there a plan to provide the evaluation script using longchat-13b-16k to reproduce LongLLMLingua?