Closed WangxuP closed 3 months ago
why do you use VLLM_TRACE_FUNCTION=1
in production environment? it is only useful for debugging.
why do you use
VLLM_TRACE_FUNCTION=1
in production environment? it is only useful for debugging.
The production environment is not that I did not use VLLM_TRACE_FUNCTION=1
environment variable, it is in the development environment.For the convenience of locating the problem
is 1 hour before or after VLLM_TRACE_FUNCTION=1
?
is 1 hour before or after
VLLM_TRACE_FUNCTION=1
?
No, When I didn't add this environment variable, it only took me an hour, so I added it for easy access to the trace log.
From what I can see:
vllm 0.4.3
You can try the latest vllm version.
requests: cpu: 16 memory: 40Gi
You can try to increase CPU memory for the container.
So NB,
I have updated vllm==0.5.1
, not increase CPU memory for the container, and it should start in about 10 minutes. Waiting for 10 minutes should be within an acceptable range.
But I am not very clear about the underlying reasons for this problem. What optimizations have you made? Where can I refer to the source code or PR? Ths!
it's difficult to tell, we are always optimizing anything we can optimize. if the newest version works, then you should just use the newest one.
if you want to dive deep, you can profile the time and find out the root cause, that is welcomed, if you'd like to do.
Okey! Recently, I have been researching this code and if I find any issues, I will raise a PR,
Ohhh! In k8s, special attention should be paid to the volumes path. If you are mounting an NFS service, the network communication between the NFS service and the current machine will also affect the startup speed of the model, as it involves the transfer of large model files.
Your current environment
python
3.10.14
GPU is V100 * 4
nvidia-smi
pkgs
k8s
docker
🐛 Describe the bug
start VLLM in pod
When I checked the log, I found that it kept looping around the following positions, and it took about an hour for it to start successfully. the log :
vLLM main log is :