microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.27k stars 228 forks source link

LLMLingua doesn't work on CPU as device_map #79

Open MrTBH opened 5 months ago

MrTBH commented 5 months ago

Hi. I've been trying to use LLM Lingua on CPU Linux Machine. When I start by python file. it just stops like the attached screenshot Screenshot_1

The code I'm using is as simple as following:

PromptCompressor(device_map="cpu")
iofu728 commented 5 months ago

Hi @MrTBH, it looks like out of memory. You might want to try using a smaller model like lgaalves/gpt2-dolly or a quantized version of the model.