Open davidkoski opened 1 month ago
e.g.
Currently, I am working on porting Llama 3.2 VLM to Swift. It would be great if we could make the vlm a separate package so that people can easily pull it down as a dependency and integrate it into their applications, for example, add vlm support for ChatMLX.
If someone can put together the basic pipeline for one vision model, I can probably port the others to Swift fairly quickly.
I am working on it right now and have paligemma done (well, not debugged but callable). I am working on how to structure the code with regard to the LLM library -- they should share code where possible.
I will try and put up the branch with what I have today. Next week will be busy so it might be two weeks from now before it is really ready.
Fantastic, thank you! Once that's in place, I'll start working on some of the other models (and will post here first to avoid duplication of work).
OK, you can see what I have -- more work to be done but the eval loop is worked out.
This continues -- I have most of the refactoring done and llm-tool
has a hard coded call to paligemma
. I need to implement a second VLM (qwen2_vl
) so I can make sure I have the right shape for the APIs.
As mentioned before this will be a breaking change in the API (so I will do a major version bump) but it should be pretty easy to adopt. Hopefully a new import
and renaming a couple things: I will produce a guide when it is ready.
Thanks @davidkoski, your work is much appreciated! Once the API is stable, I'll try to port some of the other VLMs.
Consider porting some models from https://github.com/Blaizzy/mlx-vlm to swift