Closed awni closed 4 weeks ago
Requires #1510
On an M2 Ultra (nothing else needed) the #s for Llama 3.1 70B in 16-bit precision look like:
Command ran:
python -m mlx_lm.generate --model meta-llama/Meta-Llama-3-70B-Instruct --prompt "Write a story about Einstein" --max-tokens 100
Requires #1510
On an M2 Ultra (nothing else needed) the #s for Llama 3.1 70B in 16-bit precision look like:
Command ran: