Closed thundergolfer closed 4 months ago
Oh I think I got lucky with knob twiddling. Switching to VLLM_WORKER_MULTIPROC_METHOD=fork
seems to have fixed it?
FROM python:3.11-slim-bookworm
RUN apt-get update && apt-get install --yes python3 python3-distutils clang wget vim
RUN wget https://bootstrap.pypa.io/get-pip.py
RUN python3 get-pip.py
RUN python3 -m pip install clang~=10.0.1 # must match version of `clang` installed above.
RUN python3 -m pip install --ignore-installed "vllm==0.4.1" \
"hf-transfer==0.1.6" \
"huggingface_hub==0.22.2" \
"fastapi" \
"httpx"
COPY <<EOF repro.py
import os
EOF
ENV HF_HUB_ENABLE_HF_TRANSFER=1
ENV VLLM_TRACE_FUNCTION=0
ENV VLLM_WORKER_MULTIPROC_METHOD=fork
ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server", "--model", "meta-llama/Meta-Llama-3-8B-Instruct", "--tensor-parallel-size", "2"]
I may also be the disabling VLLM_TRACE_FUNCTION
speed up the startup 100x and thus 'unstuck' things.
VLLM_TRACE_FUNCTION
is only used to debug stuck issues. Why do you turn it on anyway?
I turned it on only to debug the stuck issue. But it then became a confounder because it slowed down startup so much, made it harder to distinguish between stuck and merely slow.
We have added documentation for this situation in #5430. Please take a look.
Your current environment
🐛 Describe the bug
VLLM is getting stuck on startup, and according to
nvidia-smi
it's before it writes anything to the GPU. I have uploaded the trace file which records up to around2024-05-22 09:11:22
. At that point the trace shows it looking stuck insympy
code. I tailed the file 10 minutes later and it appeared stuck intorch/_dynamo/allowed_functions.py:322
Logs from
docker
Reproduction:
Hoping just for guidance on what could be going wrong here. I'm not familiar with the code and don't have a clue what could cause the startup to get stuck