Enable distributed LoRA training

The updates to LORA.md are missing but TL;DR we can now do

$ echo "m2-ultra-0 slots=1" >>hostfile
$ echo "m2-ultra-1 slots=1" >>hostfile
$ mpirun --hostfile hostfile -- python -m mlx_lm.lora --train --model mlx-community/Mistral-7B-v0.2-4bit --data /path/to/data --batch-size 16

to train across two nodes (or more really nothing needs to change).

ml-explore / mlx-examples

Enable distributed LoRA training #821