Open oandreeva-nv opened 4 months ago
I also observed similar thing... My current workaround is to pkill -f pt_main_thread
after terminating vLLM server.
pkill -f pt_main_thread after terminating vLLM server.
Unfortunately, this is not a viable solution for me
same issue here. pkill -f does not work for my case neither.
pkill -f pt_main_thread after terminating vLLM server.
This did not help in my case. I had to do:
top -b -n 1 | grep pt_main_thread | awk '{print $1}' | xargs kill -9
Your current environment
š Describe the bug
I am trying to understand the vllm's workflow for distributed serving via multiprocessing. The original setup is deploying a model with tensor parallel size = 2 through Triton Inference Server and
distributed_executor_backend: mp
. While inference is going well, when server is shutting down , 2 processespt_main_thread
are not killed and their status isState: S (sleeping)
.The closes reproducer outside of Triton is this:
And the workflow is the following:
And same, the above 2 processes are in the sleeping state based on
cat /proc/_PID_/status
Any insights on vllm's distributed serving with multiprocessing is greatly appreciated.