To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
I fail to compress prompt as the following example:
This is the current steps I have tried
1
from llmlingua import PromptCompressor
llm_lingua = PromptCompressor("lgaalves/gpt2-dolly", device_map="cpu")
prompt = "Today is the anniversary of the publication of Robert Frost’s iconic poem “Stopping by Woods o"
compressed_prompt = llm_lingua.compress_prompt(prompt,target_token=10)
2
from llmlingua import PromptCompressor
llm_lingua = PromptCompressor("lgaalves/gpt2-dolly", device_map="mps")
prompt = "Today is the anniversary of the publication of Robert Frost’s iconic poem “Stopping by Woods o"
compressed_prompt = llm_lingua.compress_prompt(prompt,target_token=10)
Output result:
Both of the compression output returns
{'compressed_prompt': 'Today is the anniversary of the publication of Robert Frost’s iconic poem “Stopping by Woods o', 'origin_tokens': 18, 'compressed_tokens': 18, 'ratio': '1.0x', 'rate': '100.0%', 'saving': ', Saving $0.0 in GPT-4.'}
How to see an result with some token compressed successfully?
Describe the issue
I fail to compress prompt as the following example:
This is the current steps I have tried 1 from llmlingua import PromptCompressor llm_lingua = PromptCompressor("lgaalves/gpt2-dolly", device_map="cpu") prompt = "Today is the anniversary of the publication of Robert Frost’s iconic poem “Stopping by Woods o" compressed_prompt = llm_lingua.compress_prompt(prompt,target_token=10)
2 from llmlingua import PromptCompressor llm_lingua = PromptCompressor("lgaalves/gpt2-dolly", device_map="mps")
prompt = "Today is the anniversary of the publication of Robert Frost’s iconic poem “Stopping by Woods o" compressed_prompt = llm_lingua.compress_prompt(prompt,target_token=10)
Output result: Both of the compression output returns {'compressed_prompt': 'Today is the anniversary of the publication of Robert Frost’s iconic poem “Stopping by Woods o', 'origin_tokens': 18, 'compressed_tokens': 18, 'ratio': '1.0x', 'rate': '100.0%', 'saving': ', Saving $0.0 in GPT-4.'}
How to see an result with some token compressed successfully?