vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
29.51k stars 4.43k forks source link

[Bug]: dag teardown error AttributeError: 'Worker' object has no attribute 'core_worker' #6887

Open youkaichao opened 3 months ago

youkaichao commented 3 months ago

Your current environment

The output of `python collect_env.py`

🐛 Describe the bug

command:

python benchmarks/benchmark_throughput.py --input-len 100 --output-len 100 --num-prompts 100 --model facebook/opt-125m -tp 2 --distributed-executor-backend ray

error:

2024-07-28 22:30:36,078 INFO compiled_dag_node.py:1202 -- Tearing down compiled DAG
Exception ignored in: <function RayGPUExecutor.__del__ at 0x7ff2ee7048b0>
Traceback (most recent call last):
  File "/data/youkaichao/vllm/vllm/executor/ray_gpu_executor.py", line 396, in __del__
    self.forward_dag.teardown()
  File "/data/youkaichao/miniconda/envs/vllm/lib/python3.9/site-packages/ray/dag/compiled_dag_node.py", line 1402, in teardown
    monitor.teardown(wait=True)
  File "/data/youkaichao/miniconda/envs/vllm/lib/python3.9/site-packages/ray/dag/compiled_dag_node.py", line 1204, in teardown
    outer._dag_submitter.close()
  File "/data/youkaichao/miniconda/envs/vllm/lib/python3.9/site-packages/ray/experimental/channel/common.py", line 383, in close
    self._output_channel.close()
  File "/data/youkaichao/miniconda/envs/vllm/lib/python3.9/site-packages/ray/experimental/channel/shared_memory_channel.py", line 629, in close
    channel.close()
  File "/data/youkaichao/miniconda/envs/vllm/lib/python3.9/site-packages/ray/experimental/channel/shared_memory_channel.py", line 512, in close
    self._worker.core_worker.experimental_channel_set_error(self._writer_ref)
AttributeError: 'Worker' object has no attribute 'core_worker'
[1]    3100846 segmentation fault (core dumped)  python benchmarks/benchmark_throughput.py --input-len 100 --output-len 100  

cc @ruisearch42 @rkooo567 @stephanie-wang

ruisearch42 commented 3 months ago

hmm, these should only be called when usin ADAG. Were these environment variables set? DISTRIBUTED_EXECUTOR_BACKEND=ray VLLM_USE_RAY_SPMD_WORKER=1 VLLM_USE_RAY_COMPILED_DAG=1 @youkaichao

rkooo567 commented 3 months ago

Also to be clear, we are planning to fix this soon (and it is supposed to happen only with env var like @ruisearch42 said above). If it happens without an env var, we will try fixing it immediately, otherwise it will take a few more days until we tackle it.

youkaichao commented 3 months ago

it happens when i use these env vars. so it is not user-facing now.

I just came across it when I use ray dag for testing. fixing it later should be fine.

github-actions[bot] commented 1 day ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!