microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.27k stars 228 forks source link

How to run in linux machine (CPU without GPU) #75

Open gayuoptisol opened 6 months ago

gayuoptisol commented 6 months ago

Is there any method to run LLMLingua in linux cpu machine. I am trying to load this using: from llmlingua import PromptCompressor llm_lingua = PromptCompressor(device_map="mps")

but it taking so much amount to load.

iofu728 commented 6 months ago

Hi @gayuoptisol,

You can set the device_map to “cpu” as follow,

llm_lingua = PromptCompressor(device_map="cpu")
synergiator commented 5 months ago

hi :-) is there a rule of thumb - how much GPU memory is required depending upon model/processed content, and how does this translate to required number of CPUs and their memory?