microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.42k stars 241 forks source link

PromptCompressor error - OpenAIGPTLMHeadModel.forward() got an unexpected keyword argument 'past_key_values' #48

Open manojsharmadcx opened 8 months ago

manojsharmadcx commented 8 months ago

I am trying to use OpenAI GPT-2 model for prompt compression. However, getting error "OpenAIGPTLMHeadModel.forward() got an unexpected keyword argument 'past_key_values'". Has anyone faced/facing similar issue and how can this be fixed. Thanks.

`!pip install llmlingua from llmlingua import PromptCompressor

llm_lingua = PromptCompressor( model_name = "openai-gpt", device_map="cpu", ) prompt = """ The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly. Human: Hello, who are you? AI: I am an AI created by OpenAI. How can I help you today? Human: I'd like to cancel my subscription. AI: I'm sorry to hear that. What is your subscription number? Human: 123456 AI: Thank you. Your subscription has been cancelled. Human: Thank you. Goodbye! AI: Goodbye! """ compressed_prompt = llm_lingua.compress_prompt(prompt, instruction="", question="", target_token=200) `

iofu728 commented 8 months ago

Hi @manojsharmadcx,

Thank you for your support. The issue arises because the OpenAIGPTLMHeadModel (link to code) does not support the input of a KV cache. You might consider using "gpt2" or "lgaalves/gpt2-dolly" instead.

manojsharmadcx commented 7 months ago

Hi @iofu728,

Thanks for your response. Currently the gpt-2 model seem to be downloaded from HuggingFace on application file system where it need to be used, which means the application(eg: Azure App Service) needs extra resources to run prompt compression. Is there a way to use hosted Azure Open AI model for compression instead?

Thanks, Manoj.

iofu728 commented 7 months ago

Hi @manojsharmadcx, yes, currently a local deployment of the corresponding small model is required to use this method. If the API model supports obtaining the log probabilities of the prompt section, then it's possible to implement LLMLingua through API calls.