marella / ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.
MIT License
1.79k stars 135 forks source link

CUDA error 35 #90

Open curname opened 1 year ago

curname commented 1 year ago

When i run ctransformers[cuda], i get the error: CUDA error 35 at /home/runner/work/ctransformers/ctransformers/models/ggml/ggml-cuda.cu:4236: CUDA driver version is insufficient for CUDA runtime version

However, the path "/home/runner/work/ctransformers/ctransformers/models/ggml/ggml-cuda.cu" does not exit. And my CUDA info:

gpu_info

And my package info: package

how to fix it?

curname commented 1 year ago

Here is my code: llm = AutoModelForCausalLM.from_pretrained( "starcoder.ggmlv3.q8_0.bin", model_type="gpt_bigcode", top_p=0.95, temperature=0.2, max_new_tokens=512, threads=8, gpu_layers=50 )

marella commented 1 year ago

Please update your NVIDIA Drivers and try again.

sujeendran commented 11 months ago

Hi @marella - I'm facing a similar issue in the servers that I am testing on. Upgrading the drivers might not be an option for me as it is a shared system several people use. Is it possible to manually build this library to run on Cuda 11.8 by making few tweaks to setup/cmake files?

sujeendran commented 11 months ago

Just an update, managed to get it running on CUDA 11.8😄 ! I knew it should work as I was able to run GGUF model using llama-cpp with the same CUDA versions and drivers. Here is the fix if anyone want to try it:

  1. Clone the library git clone https://github.com/marella/ctransformers.git
  2. Edit this line to use older cuda version: https://github.com/marella/ctransformers/blob/main/models/ggml/ggml-cuda.cu#L136 to:
    #if CUDART_VERSION >= 11000
  3. In the root folder, execute:
    CT_CUBLAS=1 pip install .
  4. Remember to install cuda libraries if you don't have them yet:
    pip install nvidia-cuda-runtime-cu11 nvidia-cublas-cu11

    @marella - Do you think I can start a PR to include the step 2 fix so this library is compatible with older versions too?

gorkemgoknar commented 10 months ago

this should be integrated cuda 11.8 is working fine (and 11.8 should be compatible with 11.x) , even latest pytorch version (as of today 2.1) still supports it. And for updating Nvidia-drivers, it will not be easy on a cloud provider node (or like a HF space) , also from my experience updating nvidia-drivers on older cards (2070 Turing for example), just makes them slower so I stick with the best performing version.

sujeendran commented 10 months ago

@gorkemgoknar - I have created a pull request to get this included in the main repo. In the meantime until it is merged, if someone doesn't want to make manual changes(it's a simple one anyway), they can clone and build directly from my fork: https://github.com/sujeendran/ctransformers

gorkemgoknar commented 10 months ago

@gorkemgoknar - I have created a pull request to get this included in the main repo. In the meantime until it is merged, if someone doesn't want to make manual changes(it's a simple one anyway), they can clone and build directly from my fork: https://github.com/sujeendran/ctransformers

Thank you @sujeendran . I actually builded with the fix. I can confirm with the fix it runs GGUF zephyr or mistral with nvidia-cuda-runtime-cu11==11.7.99
Just a side note for GGUF: generation performance nearly same as llama-cpp-python

AlexBlack2202 commented 10 months ago

@gorkemgoknar - I have created a pull request to get this included in the main repo. In the meantime until it is merged, if someone doesn't want to make manual changes(it's a simple one anyway), they can clone and build directly from my fork: https://github.com/sujeendran/ctransformers

Thank you @sujeendran . I actually builded with the fix. I can confirm with the fix it runs GGUF zephyr or mistral with nvidia-cuda-runtime-cu11==11.7.99 Just a side note for GGUF: generation performance nearly same as llama-cpp-python

can you run GGUF format with GPU ?

gorkemgoknar commented 10 months ago

@gorkemgoknar - I have created a pull request to get this included in the main repo. In the meantime until it is merged, if someone doesn't want to make manual changes(it's a simple one anyway), they can clone and build directly from my fork: https://github.com/sujeendran/ctransformers

Thank you @sujeendran . I actually builded with the fix. I can confirm with the fix it runs GGUF zephyr or mistral with nvidia-cuda-runtime-cu11==11.7.99 Just a side note for GGUF: generation performance nearly same as llama-cpp-python

can you run GGUF format with GPU ?

Yes, check the app.py here, GGUF is for both CPU and GPU and with changing layer on GPU you can run some of the ops on GPU if your GPU does not have enougn VRAM

https://huggingface.co/spaces/coqui/voice-chat-with-mistral