Open valer1435 opened 1 week ago
To install VLLM with CPU, you can follow these steps:
Quick Start Using Dockerfile:
$ docker build -f Dockerfile.cpu -t vllm-cpu-env --shm-size=4g .
$ docker run -it \
--rm \
--network=host \
--cpuset-cpus= \
--cpuset-mems= \
vllm-cpu-env
Build from Source:
gcc/g++ >= 12.3.0
). For example, on Ubuntu 22.4:
$ sudo apt-get update -y
$ sudo apt-get install -y gcc-12 g++-12 libnuma-dev
$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 10 --slave /usr/bin/g++ g++ /usr/bin/g++-12
$ pip install --upgrade pip
$ pip install cmake>=3.26 wheel packaging ninja "setuptools-scm>=8" numpy
$ pip install -v -r requirements-cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu
$ git clone -b rls-v3.5 https://github.com/oneapi-src/oneDNN.git
$ cmake -B ./oneDNN/build -S ./oneDNN -G Ninja -DONEDNN_LIBRARY_TYPE=STATIC \
-DONEDNN_BUILD_DOC=OFF \
-DONEDNN_BUILD_EXAMPLES=OFF \
-DONEDNN_BUILD_TESTS=OFF \
-DONEDNN_BUILD_GRAPH=OFF \
-DONEDNN_ENABLE_WORKLOAD=INFERENCE \
-DONEDNN_ENABLE_PRIMITIVE=MATMUL
$ cmake --build ./oneDNN/build --target install --config Release
$ VLLM_TARGET_DEVICE=cpu python setup.py install
These steps will help you install and run vLLM on an x86 CPU platform with support for data types FP32 and BF16.
Please follow links below:
How Tensor parallelization is implemented in the code? @RepoPilotAssistant
Tensor parallelization in the code involves sending and receiving tensor dictionaries across different ranks in a distributed environment. It handles both CPU and GPU tensors, using different groups for communication. For CPU tensors, it uses the metadata_group
, while for GPU tensors, it uses the group
. The implementation also supports a send-allgather mechanism where only a slice of the tensor is sent initially, followed by an allgather operation. Additionally, it includes a barrier synchronization method that uses the CPU group to avoid issues with GPU tensors.
Please follow links below:
What will happen if i set tensor parallel size equals 3? @RepoPilotAssistant
How to install VLLM with CPU? @RepoPilotAssistant