sasha0552 / vllm-ci

CI scripts designed to build a Pascal-compatible version of vLLM.
MIT License
8 stars 1 forks source link
nvidia vllm

vllm-ci

CI scripts designed to build a Pascal-compatible version of vLLM and Triton.

Installation

vllm

Note: this repository holds "nightly" builds of vLLM, which may have the same vLLM version between releases in this repository, but have different source code. Despite the fact that they are "nightly", they are generally stable.

Note: kernels for all GPUs except Pascal have been excluded to reduce build time and wheel size. You can still use the new GPUs using tensor parallelism with Ray (and using two instances of vLLM, one of which will use upstream vLLM). Complain in issues if it disrupts your workflow.

To install the patched vLLM (the patched triton will be installed automatically):

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
source venv/bin/activate

# Install vLLM
pip3 install --extra-index-url https://sasha0552.github.io/vllm-ci/ vllm

# Launch vLLM
vllm serve --help

To update a patched vLLM between same vLLM release versions (e.g. 0.5.0 (commit 000000) -> 0.5.0 (commit ffffff))

# Activate virtual environment
source venv/bin/activate

# Update vLLM
pip3 install --force-reinstall --extra-index-url https://sasha0552.github.io/vllm-ci/ --no-cache-dir --no-deps --upgrade vllm

aphrodite-engine

To install aphrodite-engine with the patched triton:

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
source venv/bin/activate

# Install aphrodite-engine
pip3 install --extra-index-url https://sasha0552.github.io/vllm-ci/ --extra-index-url https://downloads.pygmalion.chat/whl aphrodite-engine

# Launch aphrodite-engine
aphrodite --help

In other words, add --extra-index-url https://sasha0552.github.io/vllm-ci/ to the original installation command.

triton

To install the patched triton separately, for use in other applications (for example, Stable Diffusion WebUIs):

Note that this will install triton==2.3.0 (for torch==2.3.0)! If you need other versions of triton, check out my other repo - triton-ci. I plan to publish it on PyPI as soon as the file size limit increase request is approved.

Install application that published on PyPI and depends on triton:

# Install triton
pip3 install --extra-index-url https://sasha0552.github.io/vllm-ci/ <PACKAGE NAME>

Install triton before installing application:

# Install triton
pip3 install --extra-index-url https://sasha0552.github.io/vllm-ci/ triton

If application is already installed:

# Install triton
pip3 install --index-url https://sasha0552.github.io/vllm-ci/ --force-reinstall --no-deps triton

Don't forget to activate the virtual environment (if necessary) before performing actions!