ml-explore / mlx-examples

Examples in the MLX framework
MIT License
6.29k stars 897 forks source link

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed #1043

Open lhwong opened 1 month ago

lhwong commented 1 month ago

I got the following error when running model Imported from GGUF which is generated from the model fine-tuned with LoRA.

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed

The following are commands used

mlx_lm.lora --train --model meta-llama/Llama-3.2-1B --data ~/Projects/AI/data --iters 1000

mlx_lm.generate --model meta-llama/Llama-3.2-1B --adapter-path ./adapters --prompt "What is biomolecule?"

mlx_lm.fuse --model meta-llama/Llama-3.2-1B --adapter-path ./adapters --export-gguf

Create Modelfile FROM ./fused_model/ggml-model-f16.gguf

ollama create example -f Modelfile

ollama run example

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed /Users/runner/work/ollama/ollama/llm/llama.cpp/ggml/src/ggml-metal.m:1080: GGML_ASSERT(src1t == GGML_TYPE_F32) failed /Users/runner/work/ollama/ollama/llm/llama.cpp/ggml/src/ggml-metal.m:1080: GGML_ASSERT(src1t == GGML_TYPE_F32) failed

awni commented 1 month ago

I would file an issue with the https://github.com/ollama/ollama folks. It's not clear to me this is an issue with MLX..

lhwong commented 1 month ago

@awni Could it be due to GGUF exported by mlx_lm is F16 and the comman I used to create the model (ollama create example -f Modelfile) is wrong or certain setting is required?

"Export the fused model to GGUF. Note GGUF support is limited to Mistral, Mixtral, and Llama style models in fp16 precision." Reference: https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LORA.md

hansvdam commented 1 month ago

I have the same. Making a guff after the fuse with llama.cpp does work when running it in ollama: https://github.com/ggerganov/llama.cpp

python convert_hf_to_gguf.py /fused_model --outfile output_file.gguf

then in the ollama MODELFILE, put (with the parameters and template): FROM output_file.gguf

hschaeufler commented 1 month ago

I got the following error when running model Imported from GGUF which is generated from the model fine-tuned with LoRA.

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed

The following are commands used

mlx_lm.lora --train --model meta-llama/Llama-3.2-1B --data ~/Projects/AI/data --iters 1000

mlx_lm.generate --model meta-llama/Llama-3.2-1B --adapter-path ./adapters --prompt "What is biomolecule?"

mlx_lm.fuse --model meta-llama/Llama-3.2-1B --adapter-path ./adapters --export-gguf

Create Modelfile FROM ./fused_model/ggml-model-f16.gguf

ollama create example -f Modelfile

ollama run example

Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed /Users/runner/work/ollama/ollama/llm/llama.cpp/ggml/src/ggml-metal.m:1080: GGML_ASSERT(src1t == GGML_TYPE_F32) failed /Users/runner/work/ollama/ollama/llm/llama.cpp/ggml/src/ggml-metal.m:1080: GGML_ASSERT(src1t == GGML_TYPE_F32) failed

@lhwong @hansvdam You can fuse the model without gguf-Export in import it in ollama. It currently has only a problem in ollama with the format, which is why you have to downgrade the transformer library first. See also: https://github.com/ollama/ollama/issues/7167#issuecomment-2442207590

pipenv install transformers==4.44.2 or pip install transformers==4.44.2 (depending on your package manager)

Fuse the model without gguf

mlx_lm.fuse --model "meta-llama/Meta-Llama-3.1-8B-Instruct" \
    --adapter-path "results/llama3_1_8B_instruct_lora/tuning_11/adapters" \
    --save-path "results/llama3_1_8B_instruct_lora/tuning_11/lora_fused_model/"

Model-File

FROM "/Volumes/Extreme SSD/dartgen/results/llama3_1_8B_instruct_lora/tuning_11/lora_fused_model"

PARAMETER temperature 0.6
PARAMETER top_p  0.9

And import it: ollama create hschaeufler/dartgen-llama-3.1:8b-instruct-bf16-v11 -f Modelfile