Does vlmeval support multi card inference and batch size > 1? - Githubissues

open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

https://huggingface.co/spaces/opencompass/open_vlm_leaderboard

Apache License 2.0

1.34k stars 188 forks source link

Does vlmeval support multi card inference and batch size > 1? #32

Open John-Ge opened 10 months ago

John-Ge commented 10 months ago

Does vlmeval support multi card inference and batch size > 1?

kennymckormick commented 10 months ago

Hi, @John-Ge ,

For simplicity reasons, VLMEvalKit do not support batch size > 1 inference for now.
VLMEvalKit currently supports two types of multi-GPU inference: 1). DistributedDataParallel via torchrun, which run N VLM instances on N GPUs. It requires your VLM to be small enough and can run on a single GPU. 2). The model is configured by default to use multiple GPUs (like IDEFICS_80B_INSTRUCT). When you launch with python, it will automatically run on all available GPUs.

John-Ge commented 10 months ago

Thanks for your relpy! I would like to know what is the normal format of inference with batch size > 1? Should we deploy the model though like, vllm or tgi? Do we need to wait for them to support llava?

darkpromise98 commented 10 months ago

The authors of LLaVA have tried to create the beta-version of batch inference https://github.com/haotian-liu/LLaVA/issues/754

kennymckormick commented 10 months ago

Hi, @darkpromise98 , we will try to include this feature into VLMEvalKit recently.

darkpromise98 commented 9 months ago

Hi, @darkpromise98 , we will try to include this feature into VLMEvalKit recently.

That's great !

John-Ge commented 9 months ago

https://github.com/haotian-liu/LLaVA/issues/754#issuecomment-1907970439 this issue build a fast inference method for llava, would you add this function for every benchmark in this repo?

BTW, I find sglang may not support lora+base model. I train llava with lora. If possible, I hope you could support load base model and merge lora weights and deploy it for evaluation.

kennymckormick commented 9 months ago

Hi, @John-Ge @darkpromise98 , I have reviewed the request. I'm sorry that I may not implement this feature on my own for the following reasons:

Currently, only few VLMs supports the batch_inference interface, adding it for LLaVA may lead to some major changes in the inference pipeline of VLMEvalKit.
The inference of LLaVA is relatively fast: under batch_size=1, llava-v1.5-13b can run at 3~4 fps on a single A100. Thus I think batch_inference for LLaVA may not be a critical feature for VLMEvalKit.

BTW, I'm willing to review and merge it VLMEvalKit main branch if someone is willing to create a PR (might be relatively heavy) about it.