open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
Apache License 2.0
1.3k stars 183 forks source link

[Bug] RuntimeError: world_size (8) is not equal to tensor_model_parallel_size (1) x pipeline_model_parallel_size (1) #561

Open Wiselnn570 opened 1 week ago

Wiselnn570 commented 1 week ago

I used the interface from the vllm repository (https://github.com/vllm-project/vllm) to load the model and ran

torchrun --nproc-per-node=8 run.py --data Video-MME --model Qwen2_VL-M-RoPE-80k

for evaluation, but I got the error

RuntimeError: world_size (8) is not equal to tensor_model_parallel_size (1) x pipeline_model_parallel_size (1). 

Could you please advise on how to resolve this?

Here is the interface

from vllm import LLM
llm = LLM("/mnt/hwfile/mllm/weixilin/cache/Qwen2-VL-7B-Instruct", 
            max_model_len=100000,
            limit_mm_per_prompt={"video": 10},
            )
Wiselnn570 commented 1 week ago

https://github.com/vllm-project/vllm/blob/3ea2dc2ec49d1ddd7875045e2397ae76a8f50b38/vllm/distributed/parallel_state.py#L1025 Seems that the error occur at this assertion, so how can I modify my vlmeval program to fit the assertion, thanks.

kennymckormick commented 4 days ago

Looks like a problem related to vLLM. I think you can first launch VLM models as API services and then perform the evaluation via API calling. It's a better practice to avoid problems been coupled.