Open tonyaw opened 2 months ago
Also ulimit and lsof info:
root@8x7b-open-deployment-9fb777c9d-mwq8b:/vllm-workspace# lsof | grep pt_main_t | wc -l
26295
root@8x7b-open-deployment-9fb777c9d-mwq8b:/vllm-workspace# ulimit -n
1048576
root@8x7b-open-deployment-9fb777c9d-mwq8b:/vllm-workspace#
cc @robertgshaw2-neuralmagic
@tonyaw if you want a quick solution, you can try to add --disable-frontend-multiprocessing
What's the side effect by adding this parameter "--disable-frontend-multiprocessing"? It isn't caused by OMP_NUM_THREADS=2, right? I have two A100, so OMP_NUM_THREADS shall be 2 right?
Thanks in advance!
--disable-frontend-multiprocessing
will be slower
usually people don't need to set OMP_NUM_THREADS
for vLLM
Thanks, I will do an analysis of how many unix sockets are opened up and see if there is anything we can do to reduce the amount, since we currently open a new socket for each generate request
--disable-frontend-multiprocessing
will be slowerusually people don't need to set
OMP_NUM_THREADS
for vLLM
@youkaichao @robertgshaw2-neuralmagic I have set this param --disable-frontend-multiprocessing
, but still get the error as follows:
File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/openai/_base_client.py", line 1074, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/openai/_base_client.py", line 1026, in _request
return self._retry_request(
^^^^^^^^^^^^^^^^^^^^
File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/openai/_base_client.py", line 1074, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/openai/_base_client.py", line 1026, in _request
return self._retry_request(
^^^^^^^^^^^^^^^^^^^^
File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/openai/_base_client.py", line 1074, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/openai/_base_client.py", line 1026, in _request
return self._retry_request(
^^^^^^^^^^^^^^^^^^^^
File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/openai/_base_client.py", line 1074, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "/data/tangjiakai/anaconda3/envs/agentscope/lib/python3.11/site-packages/openai/_base_client.py", line 1041, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'detail': ''}
my vllm version is latest 0.5.5, and the cmd is
python -m vllm.entrypoints.openai.api_server \
--model /data/pretrain_dir/Meta-Llama-3-8B-Instruct \
--trust-remote-code \
--port $port \
--dtype auto \
--pipeline-parallel-size 1 \
--enforce-eager \
--enable-prefix-caching \
--enable-lora \
--disable-frontend-multiprocessing
The interesting thing is that even when I enter only one prompt at a time (to ensure the LLM isn't overloaded) during a certain period for testing the large model, it can still sometimes generate successfully and sometimes fail. The error when it fails is still "Error code: 500 - {'detail': ''}".
@TangJiakai this looks like a client side error. do you have the server side error trace?
@TangJiakai this looks like a client side error. do you have the server side error trace?
Yes, you are right! It's happened on client side.
Your current environment
🐛 Describe the bug
After I upgraded to v0.5.4, got "500 Internal Server Error". My manifest snippet to start vllm:
Backtrace log: