vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.75k stars 3.92k forks source link

[Usage]: Deploying multimodal retrieval models #7983

Open sky-2002 opened 2 weeks ago

sky-2002 commented 2 weeks ago

Your current environment

The output of `python collect_env.py`

How would you like to use vllm

I want to run inference of ColPali. I don't know how to integrate it with vllm. It used PaliGemma which is there in vLLM , but it also loads some adapters. Please let me know if it can be used right-away, or if any changes need to be made, let me know, I am happy to contribute.

Before submitting a new issue...

DarkLight1337 commented 2 weeks ago

LoRA adapters are not supported yet for multi-modal models. See #7199.

sky-2002 commented 2 weeks ago

hey @DarkLight1337 , if not the whole model, only this part and inference part cane be done with vLLM right?

DarkLight1337 commented 2 weeks ago

Currently yes, you are welcome to extend vLLM with new embedding models though!