Following the ctransformers documentation, to utilize GPU, it needed to call the AutoModelForCausalLM.from_pretrained with the gpu_layers=50 parameter.
However, this leads to error streamlit_llama | WARNING: failed to allocate 0.09 MB of pinned memory: unknown error
streamlit_llama | CUDA error 999 at /home/runner/work/ctransformers/ctransformers/models/ggml/ggml-cuda.cu:5067: unknown error
with my 4090 Driver Version: 530.41.03 in ubuntu
Following the ctransformers documentation, to utilize GPU, it needed to call the AutoModelForCausalLM.from_pretrained with the gpu_layers=50 parameter.
However, this leads to error streamlit_llama | WARNING: failed to allocate 0.09 MB of pinned memory: unknown error streamlit_llama | CUDA error 999 at /home/runner/work/ctransformers/ctransformers/models/ggml/ggml-cuda.cu:5067: unknown error with my 4090 Driver Version: 530.41.03 in ubuntu
I have tried in docker, with vary of images
My fork cuda12.1-cudnn8-devel