Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed

lhwong commented 1 week ago

I got the following error when running model Imported from GGUF which is generated from the model fine-tuned with LoRA.

The following are commands used

mlx_lm.lora --train --model meta-llama/Llama-3.2-1B --data ~/Projects/AI/data --iters 1000

mlx_lm.generate --model meta-llama/Llama-3.2-1B --adapter-path ./adapters --prompt "What is biomolecule?"

mlx_lm.fuse --model meta-llama/Llama-3.2-1B --adapter-path ./adapters --export-gguf

Create Modelfile FROM ./fused_model/ggml-model-f16.gguf

ollama create example -f Modelfile

ollama run example

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed /Users/runner/work/ollama/ollama/llm/llama.cpp/ggml/src/ggml-metal.m:1080: GGML_ASSERT(src1t == GGML_TYPE_F32) failed /Users/runner/work/ollama/ollama/llm/llama.cpp/ggml/src/ggml-metal.m:1080: GGML_ASSERT(src1t == GGML_TYPE_F32) failed

awni commented 1 week ago

I would file an issue with the https://github.com/ollama/ollama folks. It's not clear to me this is an issue with MLX..

lhwong commented 1 week ago

@awni Could it be due to GGUF exported by mlx_lm is F16 and the comman I used to create the model (ollama create example -f Modelfile) is wrong or certain setting is required?

"Export the fused model to GGUF. Note GGUF support is limited to Mistral, Mixtral, and Llama style models in fp16 precision." Reference: https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LORA.md

hansvdam commented 6 days ago

I have the same. Making a guff after the fuse with llama.cpp does work when running it in ollama: https://github.com/ggerganov/llama.cpp

python convert_hf_to_gguf.py /fused_model --outfile output_file.gguf

then in the ollama MODELFILE, put (with the parameters and template): FROM output_file.gguf

ml-explore / mlx-examples

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed #1043