Closed shadow00 closed 11 months ago
To be clear, koboldcpp uses it's own implementation of llama.cpp that they have customized quite a bit. Looking up the error, it seems this has been a long-standing issue with llama.cpp's multi-GPU support.
Not entirely sure why it happens, but I'm fairly confident that it has nothing to do with the wheels I have built. I'm not sure there is anything I could do if it was specific to my builds. I'm not doing anything particularly unique in my workflows.
You should build llama.cpp yourself and see if the error still happens. It it does, open an issue on the llama.cpp repo and, hopefully, the llama.cpp devs can give some insight and possibly fix it.
@jllllll you're right, it isn't your wheel causing the issue, it's llama.cpp
and they just fixed it in the 0.2.19 version according to another thread. Would you mind reading through this thread and seeing if an updated release referencing 0.2.19 instead of 0.2.18 is warranted for your wheels?
https://github.com/oobabooga/text-generation-webui/issues/4615
jordanbtucker comments in particular
@dnalbach 0.2.19 wheels have been uploaded and text-generation-webui has updated to use them as well.
I had actually already uploaded the wheels for the latest version around 12 hours ago. I just forgot to add them to the package index. They should be added now.
Since this issue is fixed in 0.2.19, I'll go ahead and close it. Please let me know if it continues and I'll reopen this issue.
After loading
TheBloke/CodeLlama-7B-Instruct-GGUF/codellama-7b-instruct.Q5_K_M.gguf
, as soon as I send an input I get the following error and crash:Using wheel
rocm/llama_cpp_python_cuda-0.2.11+rocm5.6.1-cp310-cp310-manylinux_2_31_x86_64.whl
Same thing happens with other models, such as
TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF
. Koboldcpp-rocm is able to load these models just fine.