marella / ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.
MIT License
1.79k stars 135 forks source link

FileNotFoundError: Could not find module '...ctransformers\lib\cuda\ctransformers.dll' (or one of its dependencies). #110

Open phoenixthinker opened 1 year ago

phoenixthinker commented 1 year ago

Hi, i can run below code last week without problem but i got below error since some days ago (after upgrade ctransformers lib). I am unable to run ctransformers with all local LLM model now. Can anyone help to solve it? Thanks.

My PC: Win10, python 3.10.6, ctransformers 0.2.24

My python code:

from ctransformers import AutoModelForCausalLM llm = AutoModelForCausalLM.from_pretrained(r'H:\TheBloke_Llama-2-13B-chat-GGML\llama-2-13b-chat.ggmlv3.q4_1.bin', model_type='llama', stream=True, gpu_layers=50)

while True: print("\n--------------------------\n") user_input = input("Your Input:") for chunk in llm(user_input, stream=True): print(chunk, end='', flush=True)

Error after upgrade ctransformers lib:

Traceback (most recent call last): File "H:\localLlama-2-13B-Chat_StreamOutput.py", line 2, in llm = AutoModelForCausalLM.from_pretrained(r'H:\TheBloke_Llama-2-13B-chat-GGML\llama-2-13b-chat.ggmlv3.q4_1.bin', model_type='llama', stream=True, gpu_layers=50) File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\ctransformers\hub.py", line 173, in from_pretrained return LLM( File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\ctransformers\llm.py", line 237, in init self._lib = load_library(lib, cuda=config.gpu_layers > 0) File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\site-packages\ctransformers\llm.py", line 124, in load_library lib = CDLL(path) File "C:\Users\me\AppData\Local\Programs\Python\Python310\lib\ctypes__init.py", line 374, in init__ self._handle = _dlopen(self._name, mode) FileNotFoundError: Could not find module 'C:\Users\me\AppData\Local\Programs\Python\Python310\Lib\site-packages\ctransformers\lib\cuda\ctransformers.dll' (or one of its dependencies). Try using the full path with constructor syntax

I verified the file exist:

'C:\Users\me\AppData\Local\Programs\Python\Python310\Lib\site-packages\ctransformers\lib\cuda\ctransformers.dll'

marella commented 1 year ago

Please run the following command and post the output:

pip show ctransformers nvidia-cuda-runtime-cu12 nvidia-cublas-cu12

Make sure you have installed the CUDA libraries using:

pip install ctransformers[cuda]
phoenixthinker commented 1 year ago

@marella Thank you for your hints. After re-install these, it works fine: pip install ctransformers[cuda] pip install nvidia-cublas-cu11 pip install nvidia-cuda-runtime-cu11

yashpundir commented 1 year ago

I was having the same issue and the above solution worked. Thanks a lot @marella . But the model isn't utilizing the GPU properly.

OS: Windows 11 RAM: 32 gb CPU: Intel i7-8550U @ 1.80 Ghz GPU: Geforce MX150

I'm using nvitop to monitor gpu usage and here is what it looks like when I run a simple query on llama-2-7b-chat 8bit quantized ggml model.

24da0d15-0731-46a8-936f-f335762b9249

The GPU memory reaches ~50% when I load the model into memory, but when I run it for inference the GPU MEM increases to ~85% while GPU UTL remains ~5% fluctuating to 30% ocassionally. It doesn't seem right, coz I also run this model with the same code and everything on a different PC where the GPU UTL usually stays consistent at about ~55-65%.

Code:


model_name = "llama-2-7b-chat.ggmlv3.q8_0.bin"

llm = AutoModelForCausalLM.from_pretrained(f'../models/{model_name}',
                        model_type='llama',
                        gpu_layers=4,                  
                        temperature=0.7,
                        max_new_tokens=512,           
                        top_k=40,                      
                        batch_size=8,                 
                        repetition_penalty=1.2,       
                        top_p=0.70,                   
                        local_files_only=True,         
                        context_length=2048)

system_message = "You are a respectful and helpful assistant. Understand the Instruction and respond appropriately"

instruction = "Write an acrostic poem  in the style of Rober Frost about how humans are harming the earth."

prompt_template = f"""System: {system_message}
Instruction: {instruction}
Assistant: """

tokens = llm.tokenize(prompt_template)
generated_tokens = llm.generate(tokens)                          
generated_text = llm.detokenize(generated_tokens)```