ml-explore / mlx-swift-examples

Examples using MLX Swift

MIT License

1.02k stars 107 forks source link

Make mlx-vlm examples in swift #132

Open davidkoski opened 1 month ago

davidkoski commented 1 month ago

Consider porting some models from https://github.com/Blaizzy/mlx-vlm to swift

davidkoski commented 1 month ago

e.g.

LLaVa llava-hf/LLaVA-NeXT-Video-7B-hf
Qwen2 VL: Qwen/Qwen2-VL-2B-Instruct
Llama 3.2 Vision: meta-llama/Llama-3.2-11B-Vision-Instruct
Phi-3 Vision microsoft/Phi-3-vision-128k-instruct
PaliGemma google/paligemma-3b-mix-224

mzbac commented 1 month ago

Currently, I am working on porting Llama 3.2 VLM to Swift. It would be great if we could make the vlm a separate package so that people can easily pull it down as a dependency and integrate it into their applications, for example, add vlm support for ChatMLX.

DePasqualeOrg commented 2 weeks ago

If someone can put together the basic pipeline for one vision model, I can probably port the others to Swift fairly quickly.

davidkoski commented 2 weeks ago

I am working on it right now and have paligemma done (well, not debugged but callable). I am working on how to structure the code with regard to the LLM library -- they should share code where possible.

I will try and put up the branch with what I have today. Next week will be busy so it might be two weeks from now before it is really ready.

DePasqualeOrg commented 2 weeks ago

Fantastic, thank you! Once that's in place, I'll start working on some of the other models (and will post here first to avoid duplication of work).

davidkoski commented 2 weeks ago

OK, you can see what I have -- more work to be done but the eval loop is worked out.

151

davidkoski commented 3 days ago

This continues -- I have most of the refactoring done and llm-tool has a hard coded call to paligemma. I need to implement a second VLM (qwen2_vl) so I can make sure I have the right shape for the APIs.

As mentioned before this will be a breaking change in the API (so I will do a major version bump) but it should be pretty easy to adopt. Hopefully a new import and renaming a couple things: I will produce a guide when it is ready.

DePasqualeOrg commented 3 days ago

Thanks @davidkoski, your work is much appreciated! Once the API is stable, I'll try to port some of the other VLMs.