vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
27.87k stars 4.11k forks source link

[Bug]: ray not work when tp>=2 #5495

Open Jimmy-Lu opened 3 months ago

Jimmy-Lu commented 3 months ago

Your current environment

The ray version is 2.10.0 and vllm version is 0.5.0+cu117

🐛 Describe the bug

Using tp=2 as code listed below:

from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="/cephfs/shared/model/llama-2-7b-hf/", tensor_parallel_size=2)

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

ray start not work:

2024-06-13 16:11:50,396 INFO worker.py:1752 -- Started a local Ray instance.
[2024-06-13 16:11:51,588 E 13261 13261] core_worker.cc:228: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory
youkaichao commented 3 months ago

I forwarded the issue to Anyscale folks (the company behind ray). Meanwhile, you can try multiprocessing backend https://docs.vllm.ai/en/latest/serving/distributed_serving.html .

richardliaw commented 3 months ago

Can you share a bit about how to reproduce this?

youkaichao commented 3 months ago

@Jimmy-Lu you can follow the issue template to report detailed environment configuration, so that they can help more.

rkooo567 commented 3 months ago

the error itself doesn't seem to be related to vllm.

Jimmy-Lu commented 3 months ago

the error itself doesn't seem to be related to vllm.

  • how did you deploy ray?
  • is it consistent? Or one time?
  • Is just using ray.init() in that cluster working?

I ran offline_inference script above and ray auto deployed. And I also tried ray start. Consistent. ray.init() works.

Jimmy-Lu commented 3 months ago

I build vllm from source, and then ran the script above. After the error, I tried different ray version and not work.

rkooo567 commented 3 months ago

do you have some time next week? I'd love to pair program to troubleshoot the issue

Jimmy-Lu commented 3 months ago

do you have some time next week? I'd love to pair program to troubleshoot the issue Yes,thank you