[Bug] RuntimeError: world_size (8) is not equal to tensor_model_parallel_size (1) x pipeline_model_parallel_size (1)

Wiselnn570 commented 3 weeks ago

I used the interface from the vllm repository (https://github.com/vllm-project/vllm) to load the model and ran

torchrun --nproc-per-node=8 run.py --data Video-MME --model Qwen2_VL-M-RoPE-80k

for evaluation, but I got the error

RuntimeError: world_size (8) is not equal to tensor_model_parallel_size (1) x pipeline_model_parallel_size (1).

Could you please advise on how to resolve this?

Here is the interface

from vllm import LLM
llm = LLM("/mnt/hwfile/mllm/weixilin/cache/Qwen2-VL-7B-Instruct", 
            max_model_len=100000,
            limit_mm_per_prompt={"video": 10},
            )

Wiselnn570 commented 3 weeks ago

https://github.com/vllm-project/vllm/blob/3ea2dc2ec49d1ddd7875045e2397ae76a8f50b38/vllm/distributed/parallel_state.py#L1025 Seems that the error occur at this assertion, so how can I modify my vlmeval program to fit the assertion, thanks.

kennymckormick commented 3 weeks ago

Looks like a problem related to vLLM. I think you can first launch VLM models as API services and then perform the evaluation via API calling. It's a better practice to avoid problems been coupled.

open-compass / VLMEvalKit

[Bug] RuntimeError: world_size (8) is not equal to tensor_model_parallel_size (1) x pipeline_model_parallel_size (1) #561