"the function failed to launch on the GPU" with 34B GGUF model

seanlynch commented 1 year ago

The bug Trying to use https://huggingface.co/Doctor-Shotgun/airoboros-2.2.1-limarpv3-y34b Q5_K_M, I get:

CUDA error 716 at /tmp/pip-install-azvh5g5w/llama-cpp-python_68eefa42c492416390b746bedd7ad475/vendor/llama.cpp/ggml-cuda.cu:6835: misaligned address

Model works fine with KoboldCpp and text-generation-webui. I am also able to load and use the unquantized version of the model using models.TransformersChat using bitsandbytes to quantize it to 4 bits.

To Reproduce Give a full working code snippet that can be pasted into a notebook cell or python file. Make sure to include the LLM load step so we know which model you are using.

from guidance import models, gen

def main():
    llm = models.LlamaCpp(
        "/home/seanl/SD/text-generation-webui/models/airoboros-2.2.1-limarpv3-y34b.q4_K_S.gguf",
        n_gpu_layers=-1,
    )

    lm = llm + "Pick a number between 1 and 10: " + gen(name="guess", stop="\n")
    print(lm["guess"])

if __name__ == "__main__":
    main()

System info (please complete the following information):

OS: Pop!_OS (Ubuntu based):
Guidance Version: 0.1.1

slundberg commented 1 year ago

Hi! Can you check and make sure the airoboros-2.2.1-limarpv3-y34b.q4_K_S.gguf file works with a regular llama_cpp_python load? That way we can sort out if this is an issue with llama.cpp compatibility or guidance compatibility. Thanks!

Also, did you make the gguf file yourself? I don't seem to see it online.

seanlynch commented 1 year ago

It works in ooba, which is using llama.cpp to load it. I'm having exactly the same problem with https://huggingface.co/TheBloke/Nethena-20B-GGUF/blob/main/nethena-20b.Q5_K_M.gguf , which also works fine in ooba and koboldcpp.

seanlynch commented 1 year ago

@slundberg Sorry, I'd missed the question about where the GGUF file came from. It's https://huggingface.co/Doctor-Shotgun/Misc-Models/blob/main/airoboros-2.2.1-limarpv3-y34b.q4_K_S.gguf .

guidance-ai / guidance

"the function failed to launch on the GPU" with 34B GGUF model #431