MLX is an efficient machine learning framework specifically designed for Apple silicon (i.e. your laptop!)
This project is a fully native SwiftUI app that allows you to run local LLMs (e.g. Llama, Mistral) on Apple silicon in real-time using MLX.
Support for iOS is coming next week.
Nous-Hermes-2-Mistral-7B-DPO-4bit-MLX
).Model | Status |
---|---|
Mistral | Supported |
Llama | Supported |
Phi | Supported |
Gemma | Supported (May have issues) |
Models are downloaded from Hugging Face. To add a new model, visit the MLX Community on HuggingFace and search for the model you want, then add it via Manage Models → Add Model
[!IMPORTANT] Note that this project is still under active development and some models may require additional implementation to run correctly.
No. This is not intended for deploying into production.
No. Everything is run locally on device.
Temperature: Controls randomness. Lowering results in less random completions. As the temperature approaches zero, the model will become deterministic and repetitive.
Top K: Sort predicted tokens by probability and discards those below the k-th one. A top-k value of 1 is equivalent to greedy search (select the most probable token).
Maximum length: The maximum number of tokens to generate. Requests can use up to 2,048 tokens shared between prompt and completion. The exact limit varies by model. (One token is roughly 4 characters for normal English text)
Special thanks to Awni Hannun and David Koski for early testing and feedback
Much ❤️ to all the folks who made MLX (especially mlx-swift) possible!