vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
29.85k stars 4.51k forks source link

[Feature]: microsoft/Phi-3-vision-128k-instruct Vision support #4958

Closed pseudotensor closed 4 months ago

pseudotensor commented 5 months ago

🚀 The feature, motivation and pitch

https://huggingface.co/microsoft/Phi-3-vision-128k-instruct

Alternatives

No response

Additional context

vllm is somewhat behind in vision support. idefics2 is supported by TGI and lllava next been out for months and not supported yet. There is a PR, is it close?

Isotr0py commented 5 months ago

The vllm's multi-modality support is still under refactoring:

So we need waiting some necessary refactoring work (like ImageProcessor support) finished before we add new vision model.