Open rmarnold opened 4 weeks ago
Can you say more about what you are looking for?
Is it a separate GGUF file which contains the adapters. Then you can load the base model GGUF as well as the adapter GGUF in llama.cpp? Does it llama.cpp support that?
@awni, Yes, in llama.cpp you can specify --lora ggml-adapter-model.bin. The issue is that after training, mlx_ml outputs adapter.safetensors, which llama.cpp does not recognize. I know for certain that the supported output format is ggml, but I have read there has been work to support gguf.
It looks like it's a file format called GGLA (possibly a simplified version of GGML? or GGUF?):
If I'm reading this correctly, the format is something like this:
0x616C6767
(algg
)1
n_dims
namelen
n_dims
- length for each tensor dimensionnamelen
- tensor name(I'm not entirely certain the format of the data here... if/when I figure more out, I'll add it)
lama.cpp dropped support for converting lora to ggml, it would be very useful if we could use adapters with llama.cpp instead of fusing or merging the fine tuned model.