ml-explore / mlx-examples

Examples in the MLX framework
MIT License
5.5k stars 791 forks source link

Enable distributed LoRA training #821

Open angeloskath opened 3 weeks ago

angeloskath commented 3 weeks ago

The updates to LORA.md are missing but TL;DR we can now do

$ echo "m2-ultra-0 slots=1" >>hostfile
$ echo "m2-ultra-1 slots=1" >>hostfile
$ mpirun --hostfile hostfile -- python -m mlx_lm.lora --train --model mlx-community/Mistral-7B-v0.2-4bit --data /path/to/data --batch-size 16

to train across two nodes (or more really nothing needs to change).