Open deltawi opened 6 months ago
Hi @deltawi, if you use the GPTQ 7b model, you will need less than 8GB of GPU memory.
Additionally, if you need to use multiple GPUs, you can use the following command:
llm_lingua = PromptCompressor("TheBloke/Llama-2-7b-Chat-GPTQ", device_map="balanced", model_config={"revision": "main"})
I have 4 GPUs RTX A5000 with 24GB memory each, but when I run the example code:
I get the error:
It seems not able to run it on multiple GPUs.