Phi 3.5 vision (4B model)

CheeseAndMeat commented 1 month ago

Model description

Lorax's official supported models does not list any vision model. This is a big gap for a very successful product. Having lorax a critical component in our tech stack without clear option of image-based language models is a big risk on our end. Can the Lorax team please prioritize on-boarding Phi3.5 vision, state of the art SML with vision? Appreciated.

https://huggingface.co/microsoft/Phi-3.5-vision-instruct

Open source status

[X] The model implementation is available
[X] The model weights are available

Provide useful links for the implementation

No response

tgaddair commented 1 month ago

Hi @CheeseAndMeat, thanks for raising this issue. There are two things here for us to do:

Add support for Phi 3.5 Vision, which we can certainly do
Update our docs for VLMs, as we do now support both Llava Next and Llama 3.2 Vision models

CheeseAndMeat commented 1 month ago

@tgaddair I really appreciate the prompt follow-up :) 1- Phi3.5 Vision outperformed LLMama3.2 Vision in our testing... We are really impressed with it! 2- Same for Phi3.5 MOE, it is much better than both Mixtral & llama3.2, would be great to have it in the roadmap as well. Thanks again!

predibase / lorax