To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
Hello again, I noticed a strange behavior while developing using llmlingua, for small prompts such as "hello" or "who", compress function throws an error for index 0 out of range.
` """
Token compression using llmlingua that uses gpt-2 small llm.
"""
Hello again, I noticed a strange behavior while developing using llmlingua, for small prompts such as "hello" or "who", compress function throws an error for index 0 out of range.
` """ Token compression using llmlingua that uses gpt-2 small llm. """
`
here is initialization to trigger error, thanks and I am looking forward to updates