GPU run unavailability - Githubissues

Following the ctransformers documentation, to utilize GPU, it needed to call the AutoModelForCausalLM.from_pretrained with the gpu_layers=50 parameter.

However, this leads to error streamlit_llama | WARNING: failed to allocate 0.09 MB of pinned memory: unknown error streamlit_llama | CUDA error 999 at /home/runner/work/ctransformers/ctransformers/models/ggml/ggml-cuda.cu:5067: unknown error with my 4090 Driver Version: 530.41.03 in ubuntu

I have tried in docker, with vary of images

My fork cuda12.1-cudnn8-devel

talhaanwarch / streamlit-llama

GPU run unavailability #2