seems, it will create number == parallel_config.world_size workers for model parallelism, but if worker's IP equal to driver's IP, this worker will be assign to driver_dummy_worker and will never append to normal worker list.
the driver_dummy_worker only act as driver not worker. so that's means one worker not invoke in model parallelism. I guess that's not expected.
can anyone make clarification, I guess I missing something, thanks
Your current environment
🐛 Describe the bug
I dig into the implements of
ray_gpu_executor.py
and find such implements: https://github.com/vllm-project/vllm/blob/ee3eea0a1b2c690557455d97074d8829d5a98320/vllm/executor/ray_gpu_executor.py#L112-123seems, it will create number ==
parallel_config.world_size
workers for model parallelism, but if worker's IP equal to driver's IP, this worker will be assign todriver_dummy_worker
and will never append to normal worker list.the
driver_dummy_worker
only act as driver not worker. so that's means one worker not invoke in model parallelism. I guess that's not expected. can anyone make clarification, I guess I missing something, thanks