[Installation]: Reduce Image size when installing wheel with cuda 11.8

Your current environment

Hello, when the Python Wheel is installed according to your documentation: https://docs.vllm.ai/en/latest/getting_started/installation.html#install-with-pip

The image size of a Docker Container adds up to 10GB, which is a lot for some Container Registries. Is there any alternative to reduce the image size of the Container Image to less then 5 GB?

Because the Image what you are proving at the Docker Registry, is much smaller.

REPOSITORY   TAG       IMAGE ID       CREATED          SIZE
vllmtest5    latest    d86273b9420d   2 minutes ago    7.9GB

How you are installing vllm

Dockerfile

FROM nvidia/cuda:11.8.0-base-ubuntu22.04 AS vllm-base

ARG VLLM_VERSION=0.4.2
ARG VLLM_PYTHON_VERSION=310

WORKDIR /vllm-workspace

RUN apt-get update -y \
    && apt-get install -y python3-pip \
    && apt-get clean && apt-get autoremove --yes \
    && rm -rf /tmp/* && rm -rf /var/lib/{apt,dpkg,cache,log}

# Workaround for https://github.com/openai/triton/issues/2507 and
# https://github.com/pytorch/pytorch/issues/107960 -- hopefully
# this won't be needed for future versions of this docker image
# or future versions of triton.
RUN ldconfig /usr/local/cuda-11.8/compat/

# install vllm wheel first, so that torch etc will be installed
RUN python3 -m pip install --upgrade pip

RUN pip install -vvv https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cu118-cp${VLLM_PYTHON_VERSION}-cp${VLLM_PYTHON_VERSION}-manylinux1_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118 \
 && rm -Rf /root/.cache/pip \
 && python3 -m pip cache purge  \
 && rm -rf /tmp/* \

#################### OPENAI API SERVER ####################
ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]

vllm-project / vllm

[Installation]: Reduce Image size when installing wheel with cuda 11.8 #4950

Your current environment

How you are installing vllm