jllllll / llama-cpp-python-cuBLAS-wheels

Wheels for llama-cpp-python compiled with cuBLAS support
The Unlicense
93 stars 40 forks source link

ggml-cuda.cu:6700: invalid resource handle #18

Closed shadow00 closed 11 months ago

shadow00 commented 1 year ago

After loading TheBloke/CodeLlama-7B-Instruct-GGUF/codellama-7b-instruct.Q5_K_M.gguf, as soon as I send an input I get the following error and crash:

ggml_init_cublas: found 2 ROCm devices:
  Device 0: AMD Radeon RX Vega, compute capability 9.0
  Device 1: AMD Radeon Vega Frontier Edition, compute capability 9.0

llama_new_context_with_model: total VRAM used: 6810.98 MB (model: 4474.98 MB, context: 2336.00 MB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
2023-10-19 01:06:52 INFO:Loaded the model in 20.07 seconds.

CUDA error 400 at /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:6700: invalid resource handle
current device: 0
/arrow/cpp/src/arrow/filesystem/s3fs.cc:2829:  arrow::fs::FinalizeS3 was not called even though S3 was initialized.  This could lead to a segmentation fault at exit

Using wheel rocm/llama_cpp_python_cuda-0.2.11+rocm5.6.1-cp310-cp310-manylinux_2_31_x86_64.whl

Same thing happens with other models, such as TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF. Koboldcpp-rocm is able to load these models just fine.

jllllll commented 1 year ago

To be clear, koboldcpp uses it's own implementation of llama.cpp that they have customized quite a bit. Looking up the error, it seems this has been a long-standing issue with llama.cpp's multi-GPU support.

Not entirely sure why it happens, but I'm fairly confident that it has nothing to do with the wheels I have built. I'm not sure there is anything I could do if it was specific to my builds. I'm not doing anything particularly unique in my workflows.

You should build llama.cpp yourself and see if the error still happens. It it does, open an issue on the llama.cpp repo and, hopefully, the llama.cpp devs can give some insight and possibly fix it.

dnalbach commented 11 months ago

@jllllll you're right, it isn't your wheel causing the issue, it's llama.cpp and they just fixed it in the 0.2.19 version according to another thread. Would you mind reading through this thread and seeing if an updated release referencing 0.2.19 instead of 0.2.18 is warranted for your wheels? https://github.com/oobabooga/text-generation-webui/issues/4615

jordanbtucker comments in particular

jllllll commented 11 months ago

@dnalbach 0.2.19 wheels have been uploaded and text-generation-webui has updated to use them as well.

I had actually already uploaded the wheels for the latest version around 12 hours ago. I just forgot to add them to the package index. They should be added now.

Since this issue is fixed in 0.2.19, I'll go ahead and close it. Please let me know if it continues and I'll reopen this issue.