turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.2k stars 235 forks source link

MemoryError #386

Closed insanesac closed 3 months ago

insanesac commented 3 months ago

Screenshot from 2024-03-27 15-10-40

python3 test_inference.py -m /app/Llama2-7B-chat-exl2 -p "Once upon a time"

I get MemoryError as shown in the screenshot. The same happens when I run chat.py. I tried logging to know which file was causing the error. output.safetensors is the cause.

I have an Ubuntu 22.04 machine with nvidia T4 GPU. The machine also has 64GB RAM.

turboderp commented 3 months ago

Are you sure the .safetensors file isn't corrupt?

insanesac commented 3 months ago

I thought that was the case too. So switched branches - 8.0bpw, 6.0bpw and 4.0bpw. The same error. Let me download the files individually and then see if that solved the issu

insanesac commented 3 months ago

Looks like it was corrupted. Downloaded the files individually fixed it. Thanks