When starting the second vllm.entrypoints.api_server using tensor parallel in a single node, the second vllm api_server Stuck in " Started a local Ray instance." OR "Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory"

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

https://docs.vllm.ai

Apache License 2.0

29.87k stars 4.51k forks source link

When starting the second vllm.entrypoints.api_server using tensor parallel in a single node, the second vllm api_server Stuck in " Started a local Ray instance." OR "Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory" #3367

Open durant1999 opened 8 months ago

durant1999 commented 8 months ago

As I mentioned above, when I try to start two vllm api_servers, both using tensor-parallel(and size =2), I find the SECOND api_server can't run successfully (while the first one is ok). Also, I find that using top command, it shows there are many ray:IDLE.

durant1999 commented 8 months ago

SOS....

qianchen94 commented 6 months ago

same with me, have you solved it?

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!