marella / ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.
MIT License
1.8k stars 137 forks source link

Model not loading on GPU #177

Open AndreaLombax opened 10 months ago

AndreaLombax commented 10 months ago

Hi, I'm having trouble with Mistral because the model is not loading on GPU but it is only running on CPU.

That's the code:

from ctransformers import AutoModelForCausalLM, Config, hub

llm = AutoModelForCausalLM.from_pretrained("TheBloke/Mistral-7B-Instruct-v0.1-GGUF", 
                                           model_file="mistral-7b-instruct-v0.1.Q5_K_M.gguf",
                                           config=hub.AutoConfig(config),
                                           model_type="mistral", gpu_layers=200)

print(llm("trying blabla"))

Versions:

CUDA: 12.2
libcudart12
nvidia drivers: 535.129.03
ctransformers: 0.2.27
transformers: 4.34.0
torch: 2.1.1
python: 3.10.13

I have two NVIDIA A16 16GB, and the load is only 4mb for each.

jameswilsongrant commented 9 months ago

I ran into this. I can recreate it on windows with a 1660ti and python 3.11 full install from python.org by running the following:

# Create a clean venv
pip install ctransformers
pip install ctransformers[cuda]

The second installation will bring in the nvidia dependencies but running a very similar code snippet the model never actually loads into GPU memory in task manager.

Does not appear tied to models or model installation method (I was scratching my head with the same model from op pulled manuall, the dolphin mistral 2.1 gguf model pulled manually, and several variations of llama2 pulled automatically using the hugging face pattern the author references in the README.md).