Closed devops35 closed 1 year ago
I have the same issue, but with OpenCL (AMD ROCm / CLBlast). Happens after some time, with any model, when chatting via the Frontend (not API). Never had that issue before.
@oobabooga how can i solve this problem?
Got the same error by using text-generation-webui API with exLlama loader in TavernAI. After restarting text-generation-webui, during the first answer got blue screen with video error.
I think it happens when the chat history gets too big and for some reason CUDA accesses this illegal memory location. Happened to me as well. First I got some errors about my sequence length being too small and after a while of increasing it, the error appeared. Resetting the history seems to work, but it's obviously no permanent solution.
EDIT: Just my theory, but I think what happens is that the GPU runs out of VRAM. I tried it with different settings and the configurations, which use less VRAM, lived longer.
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
Describe the bug
I am making requests with api. After a while, all requests start to give an error. I ran it with the following commands. Same problem with all of them.
python server.py --model TheBloke_Wizard-Vicuna-13B-Uncensored-GPTQ --wbits 4 --groupsize 128 --api --listen --auto-devices --model_type llama --xformers --no_use_cuda_fp16 --loader exllama
python server.py --model TheBloke_Wizard-Vicuna-13B-Uncensored-GPTQ --wbits 4 --groupsize 128 --api --listen --auto-devices --model_type llama --xformers --loader exllama
python server.py --model TheBloke_Wizard-Vicuna-13B-Uncensored-GPTQ --wbits 4 --groupsize 128 --api --listen --model_type llama --xformers --loader exllama
Is there an existing issue for this?
Reproduction
It happens after a random time. It usually occurs after 5-6 hours.
Screenshot
No response
Logs
System Info