ml-explore / mlx-examples

Examples in the MLX framework
MIT License
6.29k stars 897 forks source link

Wire models in MLX LM #1069

Closed awni closed 4 weeks ago

awni commented 1 month ago

Requires #1510

On an M2 Ultra (nothing else needed) the #s for Llama 3.1 70B in 16-bit precision look like:

not wired wired
prompt (16 toks) 2.35 toks/sec 27.8 toks/sec
generation (100 toks) 0.23 toks/sec 4.7 toks/sec

Command ran:

python -m mlx_lm.generate --model meta-llama/Meta-Llama-3-70B-Instruct --prompt "Write a story about Einstein" --max-tokens 100