marella / ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.
MIT License
1.82k stars 138 forks source link

GPU is not used even after specifying gpu_layers #163

Open YogeshTembe opened 1 year ago

YogeshTembe commented 1 year ago

I have installed ctransformers using -

pip install ctransformers[cuda]

I am trying following piece of code -

from langchain.llms import CTransformers
config = {'max_new_tokens': 512, 'repetition_penalty': 1.1, 'context_length': 8000, 'temperature':0, 'gpu_layers':50}
llm = CTransformers(model = "./codellama-7b.Q4_0.gguf", model_type = "llama", gpu_layers=50, config=config)

Here gpu_layers parameter is specified still gpu is not being used and complete load is on cpu. Can someone please point out if there is any step missing.

RicardoDominguez commented 1 year ago

I am observing the same issue:

import torch
from ctransformers import AutoModelForCausalLM

local_model = 'Llama-2-7B-GGML'
llm = AutoModelForCausalLM.from_pretrained(local_model, model_file='llama-2-7b-chat.Q4_K_M.gguf', gpu_layers=50)
print("torch.cuda.memory_allocated: %fGB"%(torch.cuda.memory_allocated(0)/1024/1024/1024))
jamestwhedbee commented 1 year ago

I am seeing this too using CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers

peter65374 commented 1 year ago

same here. still digging out...