WizardLM inference error: ggml-metal.m:773: false && "not implemented"

rustformers / llm

[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models

https://docs.rs/llm/latest/llm/

Apache License 2.0

6.06k stars 350 forks source link

WizardLM inference error: ggml-metal.m:773: false && "not implemented" #383

Open clarkmcc opened 11 months ago

clarkmcc commented 11 months ago

I'm getting the following error when trying to run the WizardLM-13B Q8 model. I'm running this library in a tauri app, let me know if you need any more details or testing from me. I'm running Apple M1 Max (64GB).

ggml-sys-8f6d0ee10141006f/out/ggml-metal.m:773: false && "not implemented"
ggml_metal_graph_compute_block_invoke: encoding node 186, op = RMS_NORM

LLukas22 commented 11 months ago

Could you try another quantization format? Maybe q5_1 or one of the K-quants?

clarkmcc commented 11 months ago

Yeah, so vicuna 7b 2b k-quant works. I can try others if you'd like, this just happens to be one that I have downloaded.

Edit: vicuna 33b 2-bit k-quant also works Edit: WizardLM 13B 4-bit k-quant does not work

LLukas22 commented 11 months ago

The error seams to be caused by this codeblock in the ggml metal shader implementation. We probably have to pull the latest changes into our repo or we have to check if our way of embedding the shader code into the ggml-metal.m file creates some issues. Probably something for @philpax.

@clarkmcc Could you check if these models run on the llama.cpp main branch? We use it as our current ggml source and if the error is in the shader code we have to create an issue there.

clarkmcc commented 11 months ago

@LLukas22 Running the following command seems to work just fine for me

./main -m ~/models/wizardlm-13b-v1.1.ggmlv3.q4_K_M.bin -n 128 -ngl 1 -p "The meaning of life is "

philpax commented 11 months ago

We probably need to update our implementation of the LLaMA model. Not sure if I'll be able to get around to that soon.

LLukas22 commented 11 months ago

According to https://github.com/ggerganov/llama.cpp/issues/2508 some quantizatio90ns are simply not implemented in metal.