Open seanlynch opened 1 year ago
Hi! Can you check and make sure the airoboros-2.2.1-limarpv3-y34b.q4_K_S.gguf
file works with a regular llama_cpp_python load? That way we can sort out if this is an issue with llama.cpp compatibility or guidance compatibility. Thanks!
Also, did you make the gguf file yourself? I don't seem to see it online.
It works in ooba, which is using llama.cpp to load it. I'm having exactly the same problem with https://huggingface.co/TheBloke/Nethena-20B-GGUF/blob/main/nethena-20b.Q5_K_M.gguf , which also works fine in ooba and koboldcpp.
@slundberg Sorry, I'd missed the question about where the GGUF file came from. It's https://huggingface.co/Doctor-Shotgun/Misc-Models/blob/main/airoboros-2.2.1-limarpv3-y34b.q4_K_S.gguf .
The bug Trying to use https://huggingface.co/Doctor-Shotgun/airoboros-2.2.1-limarpv3-y34b Q5_K_M, I get:
CUDA error 716 at /tmp/pip-install-azvh5g5w/llama-cpp-python_68eefa42c492416390b746bedd7ad475/vendor/llama.cpp/ggml-cuda.cu:6835: misaligned address
Model works fine with KoboldCpp and text-generation-webui. I am also able to load and use the unquantized version of the model using models.TransformersChat using bitsandbytes to quantize it to 4 bits.
To Reproduce Give a full working code snippet that can be pasted into a notebook cell or python file. Make sure to include the LLM load step so we know which model you are using.
System info (please complete the following information):