ml-explore / mlx-examples

Examples in the MLX framework
MIT License
5.5k stars 791 forks source link

[Feature] Export Lora Adapters as GGML #816

Open rmarnold opened 4 weeks ago

rmarnold commented 4 weeks ago

lama.cpp dropped support for converting lora to ggml, it would be very useful if we could use adapters with llama.cpp instead of fusing or merging the fine tuned model.

awni commented 3 weeks ago

Can you say more about what you are looking for?

Is it a separate GGUF file which contains the adapters. Then you can load the base model GGUF as well as the adapter GGUF in llama.cpp? Does it llama.cpp support that?

rmarnold commented 3 weeks ago

@awni, Yes, in llama.cpp you can specify --lora ggml-adapter-model.bin. The issue is that after training, mlx_ml outputs adapter.safetensors, which llama.cpp does not recognize. I know for certain that the supported output format is ggml, but I have read there has been work to support gguf.

yonomitt commented 2 weeks ago

It looks like it's a file format called GGLA (possibly a simplified version of GGML? or GGUF?):

https://github.com/ggerganov/llama.cpp/blob/21be9cab94e0b5b53cb6edeeebf8c8c799baad03/examples/export-lora/export-lora.cpp#L225

If I'm reading this correctly, the format is something like this:

HEADER

TENSORS (one after another)

ONE TENSOR METADATA

ONE TENSOR DATA (aligned to 32 bytes)

(I'm not entirely certain the format of the data here... if/when I figure more out, I'll add it)