vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
29.52k stars 4.43k forks source link

[Model]: Support for InternVL2 #6321

Closed Weiyun1025 closed 3 months ago

Weiyun1025 commented 3 months ago

🚀 The feature, motivation and pitch

InternVL2 is currently the most powerful open-source Multimodal Large Language Model (MLLM). The InternVL2 family includes models ranging from a 2B model, suitable for edge devices, to a 108B model, which is significantly more powerful. With larger-scale language models, InternVL2-Pro demonstrates outstanding multimodal understanding capabilities, matching the performance of commercial closed-source models across various benchmarks.

Given the significant potential of InternVL2, we believe that integrating it with vLLM would greatly benefit both the vLLM community and users of this model. We kindly request your assistance in enabling the deployment of InternVL2 using the vLLM framework.

We look forward to your positive response and are eager to collaborate on this exciting endeavor.

Alternatives

No response

Additional context

Blog:https://internvl.github.io/blog/2024-07-02-InternVL-2.0/ Model Family:https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e

ywang96 commented 3 months ago

Hey @Weiyun1025! Thank you for making this issue and I took a brief look at the model repo https://huggingface.co/OpenGVLab/InternVL2-40B/tree/main. It seems to me that supporting this model should be pretty straightforward (similar to what we did with Phi-3-vision).

Are you planning to make a pull request on this? If so, feel free to take a look at other vision language model implementations on vLLM and let us know if you run into any issue. We're happy to help you on getting this model supported.

If you cannot make a pull request, I will try to see if I have some bandwidth to make a PR on this. Feel free to check out #4194 for the full roadmap around multi-modality.

Thanks!