Open wooyeonlee0 opened 1 month ago
cc @njhill
Thanks for reporting @wooyeonlee0, I'll look into this.
https://github.com/vllm-project/vllm/pull/5987 fixes part of this (worker proc remained in broadcast loop), still need to get to the bottom of the resource leak messages though.
Your current environment
🐛 Describe the bug
It's stuck in the process of shutting down multiproc workers. And after a while, it shuts down automatically.
python3 benchmark_latency.py --max-model-len 2048 --use-v2-block-manager --model facebook/opt-30b --batch-size 8 -tp 2
python3 benchmark_latency.py --max-model-len 2048 --use-v2-block-manager --model facebook/opt-30b --batch-size 8 -tp 4