Open manojsharmadcx opened 10 months ago
Hi @manojsharmadcx,
Thank you for your support. The issue arises because the OpenAIGPTLMHeadModel (link to code) does not support the input of a KV cache. You might consider using "gpt2" or "lgaalves/gpt2-dolly" instead.
Hi @iofu728,
Thanks for your response. Currently the gpt-2 model seem to be downloaded from HuggingFace on application file system where it need to be used, which means the application(eg: Azure App Service) needs extra resources to run prompt compression. Is there a way to use hosted Azure Open AI model for compression instead?
Thanks, Manoj.
Hi @manojsharmadcx, yes, currently a local deployment of the corresponding small model is required to use this method. If the API model supports obtaining the log probabilities of the prompt section, then it's possible to implement LLMLingua through API calls.
I am trying to use OpenAI GPT-2 model for prompt compression. However, getting error "OpenAIGPTLMHeadModel.forward() got an unexpected keyword argument 'past_key_values'". Has anyone faced/facing similar issue and how can this be fixed. Thanks.
`!pip install llmlingua from llmlingua import PromptCompressor
llm_lingua = PromptCompressor( model_name = "openai-gpt", device_map="cpu", ) prompt = """ The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly. Human: Hello, who are you? AI: I am an AI created by OpenAI. How can I help you today? Human: I'd like to cancel my subscription. AI: I'm sorry to hear that. What is your subscription number? Human: 123456 AI: Thank you. Your subscription has been cancelled. Human: Thank you. Goodbye! AI: Goodbye! """ compressed_prompt = llm_lingua.compress_prompt(prompt, instruction="", question="", target_token=200) `