Model not loading on GPU

marella / ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.

MIT License

1.8k stars 137 forks source link

Hi, I'm having trouble with Mistral because the model is not loading on GPU but it is only running on CPU.

That's the code:

from ctransformers import AutoModelForCausalLM, Config, hub

llm = AutoModelForCausalLM.from_pretrained("TheBloke/Mistral-7B-Instruct-v0.1-GGUF", 
                                           model_file="mistral-7b-instruct-v0.1.Q5_K_M.gguf",
                                           config=hub.AutoConfig(config),
                                           model_type="mistral", gpu_layers=200)

print(llm("trying blabla"))

Versions:

CUDA: 12.2
libcudart12
nvidia drivers: 535.129.03
ctransformers: 0.2.27
transformers: 4.34.0
torch: 2.1.1
python: 3.10.13

I have two NVIDIA A16 16GB, and the load is only 4mb for each.

marella / ctransformers

Model not loading on GPU #177