Closed sfc-gh-zhwang closed 6 months ago
How do you install vllm? What is the command you run? As which user?
The message you post in slack channel is:
vLLM is using nccl==2.20.5
However, normally this should be 2.18.1
, since :
[pip3] vllm-nccl-cu12==2.18.1.0.4.0
Besides, please post full logging information to help identify the problem.
Sorry, i have two environments, and the log is a bit messed up, now updated with a fixed version and full log+error message is uploaded through a gist.
We installed vllm through docker, but before that we installed aws's own nccl plugin, i guess this might cause issue?
FROM nvcr.io/nvidia/pytorch:24.03-py3
USER root
RUN apt-get update -y
RUN apt-get install -y --no-install-recommends \
git \
git-lfs \
htop \
libaio-dev \
libhwloc-dev
RUN git lfs install
# Install EFA
RUN curl -O https://efa-installer.amazonaws.com/aws-efa-installer-1.29.0.tar.gz
RUN tar -xf aws-efa-installer-1.29.0.tar.gz && \
cd aws-efa-installer && \
./efa_installer.sh -y -g -d --skip-kmod --skip-limit-conf --no-verify
# Install custom aws-ofi-nccl plugin in order to avoid GPU memory fragmentation
RUN wget https://github.com/aws/aws-ofi-nccl/releases/download/v1.7.4-aws/aws-ofi-nccl-1.7.4-aws.tar.gz
RUN tar -xf aws-ofi-nccl-1.7.4-aws.tar.gz && \
cd aws-ofi-nccl-1.7.4-aws && \
./configure --prefix=/opt/aws-ofi-nccl \
--with-mpi=/opt/amazon/openmpi \
--with-libfabric=/opt/amazon/efa \
--with-cuda=/usr/local/cuda \
--enable-platform-aws && \
make && make install
ENV LD_LIBRARY_PATH="/opt/aws-ofi-nccl/lib:$LD_LIBRARY_PATH"
ENV PATH="/opt/aws-ofi-nccl/bin:/opt/amazon/efa:/opt/amazon/openmpi/bin/:$PATH"
RUN addgroup corvo -gid 1000
RUN adduser -disabled-password -u 1000 -gid 1000 corvo
USER corvo
WORKDIR /home/corvo
ENV VLLM_INSTALL_PUNICA_KERNELS=1
RUN pip install -e vllm
It is a known issue for nccl >= 2.19 to have memory issue when used with cudagraph: https://github.com/NVIDIA/nccl/issues/1234
If you are using multiple users in docker, be sure to check out the doc https://docs.vllm.ai/en/latest/serving/deploying_with_docker.html :
vLLM docker image is currently designed to be run under the root user (contribution welcomed for changing this!). It will try to load library at runtime under the root user’s home directory, e.g. /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1 . If you are running the container under a different user, you may need to change the permissions of the library (and all the parent directories) to allow the user to access it. Then run vLLM with environment variable VLLM_NCCL_SO_PATH=/root/.config/vllm/nccl/cu12/libnccl.so.2.18.1 .
@youkaichao THANKS A LOT for the information, I took a look at our config and combine your information, i think i know what happened. We somehow mount a dir at /root/.config/vllm
so the original nccl so was removed.
Your current environment
🐛 Describe the bug
I am running llama2-70b-chat mode on a 40GiBx8 A100 node.
disable_custom_all_reduce=False, enforce_eager=False
, it works fine;disable_custom_all_reduce=True, enforce_eager=False
, it fails with CUDA OOM;disable_custom_all_reduce=True, enforce_eager=True
, it works fine;Error is linked here: https://gist.github.com/sfc-gh-zhwang/5e4cd04d87a1823a316d983289dfbd21