Open John-Ge opened 10 months ago
Hi, @John-Ge ,
Thanks for your relpy! I would like to know what is the normal format of inference with batch size > 1? Should we deploy the model though like, vllm or tgi? Do we need to wait for them to support llava?
The authors of LLaVA have tried to create the beta-version of batch inference https://github.com/haotian-liu/LLaVA/issues/754
Hi, @darkpromise98 , we will try to include this feature into VLMEvalKit recently.
Hi, @darkpromise98 , we will try to include this feature into VLMEvalKit recently.
That's great !
https://github.com/haotian-liu/LLaVA/issues/754#issuecomment-1907970439 this issue build a fast inference method for llava, would you add this function for every benchmark in this repo?
BTW, I find sglang may not support lora+base model. I train llava with lora. If possible, I hope you could support load base model and merge lora weights and deploy it for evaluation.
Hi, @John-Ge @darkpromise98 , I have reviewed the request. I'm sorry that I may not implement this feature on my own for the following reasons:
batch_inference
interface, adding it for LLaVA may lead to some major changes in the inference pipeline of VLMEvalKit. batch_size=1
, llava-v1.5-13b can run at 3~4 fps on a single A100. Thus I think batch_inference
for LLaVA may not be a critical feature for VLMEvalKit. BTW, I'm willing to review and merge it VLMEvalKit main branch if someone is willing to create a PR (might be relatively heavy) about it.
Does vlmeval support multi card inference and batch size > 1?