Open lhwong opened 1 month ago
I would file an issue with the https://github.com/ollama/ollama folks. It's not clear to me this is an issue with MLX..
@awni Could it be due to GGUF exported by mlx_lm is F16 and the comman I used to create the model (ollama create example -f Modelfile) is wrong or certain setting is required?
"Export the fused model to GGUF. Note GGUF support is limited to Mistral, Mixtral, and Llama style models in fp16 precision." Reference: https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LORA.md
I have the same. Making a guff after the fuse with llama.cpp does work when running it in ollama: https://github.com/ggerganov/llama.cpp
python convert_hf_to_gguf.py
then in the ollama MODELFILE, put (with the parameters and template): FROM output_file.gguf
I got the following error when running model Imported from GGUF which is generated from the model fine-tuned with LoRA.
Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed
The following are commands used
mlx_lm.lora --train --model meta-llama/Llama-3.2-1B --data ~/Projects/AI/data --iters 1000
mlx_lm.generate --model meta-llama/Llama-3.2-1B --adapter-path ./adapters --prompt "What is biomolecule?"
mlx_lm.fuse --model meta-llama/Llama-3.2-1B --adapter-path ./adapters --export-gguf
Create Modelfile
FROM ./fused_model/ggml-model-f16.gguf
ollama create example -f Modelfile
ollama run example
Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed /Users/runner/work/ollama/ollama/llm/llama.cpp/ggml/src/ggml-metal.m:1080: GGML_ASSERT(src1t == GGML_TYPE_F32) failed /Users/runner/work/ollama/ollama/llm/llama.cpp/ggml/src/ggml-metal.m:1080: GGML_ASSERT(src1t == GGML_TYPE_F32) failed
@lhwong @hansvdam You can fuse the model without gguf-Export in import it in ollama. It currently has only a problem in ollama with the format, which is why you have to downgrade the transformer library first. See also: https://github.com/ollama/ollama/issues/7167#issuecomment-2442207590
pipenv install transformers==4.44.2
or pip install transformers==4.44.2
(depending on your package manager)
Fuse the model without gguf
mlx_lm.fuse --model "meta-llama/Meta-Llama-3.1-8B-Instruct" \
--adapter-path "results/llama3_1_8B_instruct_lora/tuning_11/adapters" \
--save-path "results/llama3_1_8B_instruct_lora/tuning_11/lora_fused_model/"
Model-File
FROM "/Volumes/Extreme SSD/dartgen/results/llama3_1_8B_instruct_lora/tuning_11/lora_fused_model"
PARAMETER temperature 0.6
PARAMETER top_p 0.9
And import it:
ollama create hschaeufler/dartgen-llama-3.1:8b-instruct-bf16-v11 -f Modelfile
I got the following error when running model Imported from GGUF which is generated from the model fine-tuned with LoRA.
Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed
The following are commands used
mlx_lm.lora --train --model meta-llama/Llama-3.2-1B --data ~/Projects/AI/data --iters 1000
mlx_lm.generate --model meta-llama/Llama-3.2-1B --adapter-path ./adapters --prompt "What is biomolecule?"
mlx_lm.fuse --model meta-llama/Llama-3.2-1B --adapter-path ./adapters --export-gguf
Create Modelfile
FROM ./fused_model/ggml-model-f16.gguf
ollama create example -f Modelfile
ollama run example
Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F32) failed /Users/runner/work/ollama/ollama/llm/llama.cpp/ggml/src/ggml-metal.m:1080: GGML_ASSERT(src1t == GGML_TYPE_F32) failed /Users/runner/work/ollama/ollama/llm/llama.cpp/ggml/src/ggml-metal.m:1080: GGML_ASSERT(src1t == GGML_TYPE_F32) failed