microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.27k stars 228 forks source link

llama instead of gpt #74

Open jwahnn opened 6 months ago

jwahnn commented 6 months ago

Just a few questions about using LLMLingua.

  1. How do I adjust the code so that I am using Llama instead of GPT?
  2. The reason I am using Llama instead of GPT is because I don't want my data to be sent to any other company's server. Using Llama, is my prompt or data being sent to some server?
iofu728 commented 6 months ago

Hi @jwahnn,

Thank you for your support of LLMLingua. You can directly use the current code to compress prompts and input them into LLaMA. Experiments in LongLLMLingua have shown that even using open-source models like LongChat-13b as LLMs can effectively understand compressed prompts.

Your concern makes sense, our compression process does not send results to any server; it is processed locally only.

jwahnn commented 6 months ago

Hi @iofu728, thanks for the input. Just a few more follow up question, though. I am slightly confused about the description provided in the main GitHub page (https://github.com/microsoft/LLMLingua#2-using-longllmlingua-for-prompt-compression). Do I just run the file that contain those lines of code? Also, your demo says "Using the LLaMA2-7B as a small language model would result in a significant performance improvement, especially at high compression ratios." Does the current version on Github use LLaMA2, then?

iofu728 commented 6 months ago

Hi @jwahnn,

The link at https://github.com/microsoft/LLMLingua#2-using-longllmlingua-for-prompt-compression is just a quick start guide on how to use our library. For more detailed information and examples, please refer to our documentation and the examples section.

Regarding your second question, yes, you can use different models in LLMLingua by specifying the model_name parameter. The default model is "llama 2-7b".

llm_lingua = PromptCompressor(model_name="microsoft/phi-2")