Convert MLX model to PyTorch/Hugging Face - Githubissues

ml-explore / mlx-examples

Examples in the MLX framework

MIT License

6.19k stars 878 forks source link

Convert MLX model to PyTorch/Hugging Face #118

Open fakerybakery opened 11 months ago

fakerybakery commented 11 months ago

Hi, Is it possible to convert a LoRA model trained with MLX back into the HuggingFace format to publish on the HuggingFace hub, and preferably merge it with the main model? Thank you!

tawnkramer commented 11 months ago

This would be a nice feature.

justinh-rahb commented 11 months ago

Count me as interested in this.

USMCM1A1 commented 10 months ago

It would be great to convert the model files and adapter to a GGUF file.

bernaferrari commented 10 months ago

Yeah. Mlx is super nice, but it is missing the "deploy" part, what do you do after you like your end result and want other people to enjoy it too?

l0d0v1c commented 10 months ago

Merging is implemented here https://github.com/mzbac/mlx-lora but I didn't find yet how to convert to gguf

awni commented 10 months ago

That's not yet supported. We have some on going work for GGUF support, see e.g. https://github.com/ml-explore/mlx/pull/350

bernaferrari commented 10 months ago

question from ignorant person, but why mlx format is different from ggpuf, is there any place I can read that?

awni commented 10 months ago

MLX has multiple "formats" that we save arrays in. The docs are a bit scattered but you can find the save load functions docs, for example ops page.

We currently support the standard numpy format (along with zip and compressed zip) and safetensors. GGUF is in the pipeline.

bernaferrari commented 10 months ago

is there a way to load mlx into web socket? Like lm studio?

I'm curious if I could serve my own model via mlx into other apps.

l0d0v1c commented 10 months ago

Thank you @awni . MLX fine tuning is very good on mistral. A pity we can't get a gguf compatible for llama.cpp. or maye reverse quantisation to HF format?

fakerybakery commented 10 months ago

If the gguf PR is merged, then MLX -> GGUF -> reverse the GGUF convert.py script to create HF model? The convert.py script seems in llama.cpp seems quite complicated, but looks possible.

l0d0v1c commented 9 months ago

Succeeded by using fuse.py python fuse.py —model mlx_model —save-path ./fuse —adapter-file adapater.npz then rename weights.00.safetensors to model.safetensors. The convert.py from llama.cpp works fine afterward.

python [convert.py](http://convert.py/) ./fuse
./quantize ./fuse/ggml-model-f16.gguf ./fuse/modelq5.gguf q5_0

USMCM1A1 commented 9 months ago

@l0d0v1c : I dropped ".fuse' from the python fuse.py step and reformatted the hyphens and got that work. That second part has nothing to do with MLX, correct? I have to get llama.cpp to do the GGUF conversion after renaming the weights.00.safetensors file?

l0d0v1c commented 9 months ago

Yes exactly

awni commented 9 months ago

Can you outline the steps you took in detail? We can see which ones we can improve on our end. For example we could easily change the naming convention to model.safetensors which might make one step simpler. We could also provide a dequantize option in fuse.py.

USMCM1A1 commented 9 months ago

@l0d0v1c I'm struggling with this (I'm a linguist with no computer/data science training). I've cloned the llama.cpp repo. If the fused/renamed model was in /Users/williammarcellino/mlx-examples/lora/lora_fused_model_GrKRoman_1640 how would I format a command to convert to gguf? Thanks in advance for any help :)

l0d0v1c commented 9 months ago

@USMCM1A1 you have to clone llama.cpp repo then "make" is enough on mac. rename weights.00 to model python convert.py thedirectoryofyourmodel It will produce a file "ggml-model-f16.gguf" in the same directory Then you can use ./quantize thedirectoryofyourmodel/ggml-model-f16.gguf Thefinal.gguf q4_0

On my experiments on a mlx finetuned model, q8_0 is necessary instead of q4_0

@awni Changing naming convention is a good idea.Another idea is to allow convert just lora to gguf

l0d0v1c commented 9 months ago

@USMCM1A1 my project if also linguistic (ancient greek). I'm not computer scientist as well but I play with buttons.

USMCM1A1 commented 9 months ago

@l0d0v1c Awesome that worked! I have a working gguf_q8 version up and running in LM Studio 😊 Thank you so much.

Also: my ft happens to be on the classical world (Hellenic & Roman).

l0d0v1c commented 9 months ago

@USMCM1A1 I work on a AI able to deal with Diogenes and Antisthene philosophy. The results are just incredible. Happy you succeeded. I sent you a linkedin invitation to share about our... unusual subject.