MLX Swift Chat: Run LLM models locally with MLX!

https://github.com/PreternaturalAI/mlx-swift-chat/assets/8635253/f20862f3-8cab-4803-ba6e-44108b075c9b

Run LLM models locally with MLX!

MLX is an efficient machine learning framework specifically designed for Apple silicon (i.e. your laptop!)

– @awnihannun

This project is a fully native SwiftUI app that allows you to run local LLMs (e.g. Llama, Mistral) on Apple silicon in real-time using MLX.

Installation

Open the Xcode project.
Go to Signing & Capabilities.
Change the Team to your own team.
Set the destination to My Mac.
Click Run.

Support for iOS is coming next week.

Usage

Click on Manage Models in the inspector view.
Download and install a model (we recommend starting with Nous-Hermes-2-Mistral-7B-DPO-4bit-MLX).
Go back to the inspector and select the downloaded model from the model picker.
Wait for the model to load, the status bar will flash "Ready" once it is loaded.
Click the run button.

Roadmap

[ ] Fix iOS builds
[ ] Implement support for StableLM
[ ] Implement basic support automatically adding model-specific chat templates to the prompt
[ ] Add support for stop sequences
[ ] Add more model suggestions
[ ] ... (many, many more items to be added soon pending sleep)

Frequently Asked Questions

What models are currently supported?

Model	Status
Mistral	Supported
Llama	Supported
Phi	Supported
Gemma	Supported (May have issues)

How do I add new models?

Models are downloaded from Hugging Face. To add a new model, visit the MLX Community on HuggingFace and search for the model you want, then add it via Manage Models → Add Model

[!IMPORTANT] Note that this project is still under active development and some models may require additional implementation to run correctly.

Is this suitable for production?

No. This is not intended for deploying into production.

What are the minimum hardware and software requirements?

Apple Silicon Mac (M1/M2/M3) with macOS 14.0 or newer.
Any A-Series chip (iPad, iPhone) with iOS 17.2 or newer.

Does this collect any data?

No. Everything is run locally on device.

What are the parameters?

Temperature: Controls randomness. Lowering results in less random completions. As the temperature approaches zero, the model will become deterministic and repetitive.
Top K: Sort predicted tokens by probability and discards those below the k-th one. A top-k value of 1 is equivalent to greedy search (select the most probable token).
Maximum length: The maximum number of tokens to generate. Requests can use up to 2,048 tokens shared between prompt and completion. The exact limit varies by model. (One token is roughly 4 characters for normal English text)

Acknowledgements

Special thanks to Awni Hannun and David Koski for early testing and feedback

Much ❤️ to all the folks who made MLX (especially mlx-swift) possible!

preternatural-explore / mlx-swift-chat

readme