microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.48k stars 251 forks source link

[Question]: How to compress a simple prompt on mac #138

Closed vanillacandy closed 5 months ago

vanillacandy commented 5 months ago

Describe the issue

I fail to compress prompt as the following example:

This is the current steps I have tried 1 from llmlingua import PromptCompressor llm_lingua = PromptCompressor("lgaalves/gpt2-dolly", device_map="cpu") prompt = "Today is the anniversary of the publication of Robert Frost’s iconic poem “Stopping by Woods o" compressed_prompt = llm_lingua.compress_prompt(prompt,target_token=10)

2 from llmlingua import PromptCompressor llm_lingua = PromptCompressor("lgaalves/gpt2-dolly", device_map="mps")

prompt = "Today is the anniversary of the publication of Robert Frost’s iconic poem “Stopping by Woods o" compressed_prompt = llm_lingua.compress_prompt(prompt,target_token=10)

Output result: Both of the compression output returns {'compressed_prompt': 'Today is the anniversary of the publication of Robert Frost’s iconic poem “Stopping by Woods o', 'origin_tokens': 18, 'compressed_tokens': 18, 'ratio': '1.0x', 'rate': '100.0%', 'saving': ', Saving $0.0 in GPT-4.'}

How to see an result with some token compressed successfully?

vanillacandy commented 5 months ago

I found the model works for my environment. microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank