talhaanwarch / streamlit-llama

Streamlit chatbot with Llama-2-7B-chat
https://chatdemo.talhaanwar.com/
27 stars 13 forks source link

GPU run unavailability #2

Closed format37 closed 10 months ago

format37 commented 10 months ago

Following the ctransformers documentation, to utilize GPU, it needed to call the AutoModelForCausalLM.from_pretrained with the gpu_layers=50 parameter.

However, this leads to error streamlit_llama | WARNING: failed to allocate 0.09 MB of pinned memory: unknown error streamlit_llama | CUDA error 999 at /home/runner/work/ctransformers/ctransformers/models/ggml/ggml-cuda.cu:5067: unknown error with my 4090 Driver Version: 530.41.03 in ubuntu

I have tried in docker, with vary of images

My fork cuda12.1-cudnn8-devel

format37 commented 10 months ago

Sorry for the concern. The issue was at my side. The driver updating to the 545 version had helped. The driver update process is described there: https://gist.github.com/format37/2d7bd6ffd92243d8578c284fc6b77e02