marella / ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.
MIT License
1.76k stars 137 forks source link

CUDA error - the provided PTX was compiled with an unsupported toolchain #162

Closed melindmi closed 8 months ago

melindmi commented 8 months ago

Hi, I am trying to use the llama-2-70b-chat.Q5_K_M.gguf model with ctransformers on GPU but I get this error: CUDA error 222 at /home/runner/work/ctransformers/ctransformers/models/ggml/ggml-cuda.cu:6045: the provided PTX was compiled with an unsupported toolchain. My torch version is '2.1.0+cu121' and the GPU Driver Version: 525.125.06 supporting CUDA Version: 12.0.

The code: llm = AutoModelForCausalLM.from_pretrained("../llama", model_file="llama-2-13b-chat.q5_K_M.gguf", model_type="llama", gpu_layers=50, temperature=1, context_length=4096)

Can someone suggest something on this?

melindmi commented 8 months ago

In case someone else encounters the same issue, the problem was caused by having a nvcc version not compatible with the GPU driver version.
When installing with pip install ctransformers[cuda] precompiled libs for CUDA 12.2 are used, but in my cases I needed CUDA version 12.0. If I used CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers by default the CUDA compiler path was /usr/bin/ which in my case had an older version of nvcc. The solution was to install the right CUDA version in a different path and then install ctransformers with: CMAKE_ARGS="-DCMAKE_CUDA_COMPILER=/path_to_cuda/bin/nvcc" CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers