Open YogeshTembe opened 1 year ago
I am observing the same issue:
import torch
from ctransformers import AutoModelForCausalLM
local_model = 'Llama-2-7B-GGML'
llm = AutoModelForCausalLM.from_pretrained(local_model, model_file='llama-2-7b-chat.Q4_K_M.gguf', gpu_layers=50)
print("torch.cuda.memory_allocated: %fGB"%(torch.cuda.memory_allocated(0)/1024/1024/1024))
I am seeing this too using
CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers
same here. still digging out...
I have installed ctransformers using -
pip install ctransformers[cuda]
I am trying following piece of code -
Here gpu_layers parameter is specified still gpu is not being used and complete load is on cpu. Can someone please point out if there is any step missing.