[Bug]: ray not work when tp>=2

Jimmy-Lu commented 3 months ago

Your current environment

The ray version is 2.10.0 and vllm version is 0.5.0+cu117

🐛 Describe the bug

Using tp=2 as code listed below:

from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="/cephfs/shared/model/llama-2-7b-hf/", tensor_parallel_size=2)

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

ray start not work:

2024-06-13 16:11:50,396 INFO worker.py:1752 -- Started a local Ray instance.
[2024-06-13 16:11:51,588 E 13261 13261] core_worker.cc:228: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory

youkaichao commented 3 months ago

I forwarded the issue to Anyscale folks (the company behind ray). Meanwhile, you can try multiprocessing backend https://docs.vllm.ai/en/latest/serving/distributed_serving.html .

richardliaw commented 3 months ago

Can you share a bit about how to reproduce this?

youkaichao commented 3 months ago

@Jimmy-Lu you can follow the issue template to report detailed environment configuration, so that they can help more.

rkooo567 commented 3 months ago

the error itself doesn't seem to be related to vllm.

how did you deploy ray?
is it consistent? Or one time?
Is just using ray.init() in that cluster working?

Jimmy-Lu commented 3 months ago

the error itself doesn't seem to be related to vllm.

how did you deploy ray?

is it consistent? Or one time?

Is just using ray.init() in that cluster working?

I ran offline_inference script above and ray auto deployed. And I also tried ray start. Consistent. ray.init() works.

Jimmy-Lu commented 3 months ago

I build vllm from source, and then ran the script above. After the error, I tried different ray version and not work.

rkooo567 commented 3 months ago

do you have some time next week? I'd love to pair program to troubleshoot the issue

Jimmy-Lu commented 3 months ago

do you have some time next week? I'd love to pair program to troubleshoot the issue Yes，thank you

vllm-project / vllm

[Bug]: ray not work when tp>=2 #5495

Your current environment

🐛 Describe the bug