openvinotoolkit / openvino_contrib

Repository for OpenVINO's extra modules
Apache License 2.0
105 stars 144 forks source link

PagedAttention operation #867

Closed ilya-lavrenov closed 5 months ago

ilya-lavrenov commented 9 months ago

Introduced initial version of PagedAttention

cmake -DCUSTOM_OPERATIONS="paged_attention" <modules/custom_operations>
cmake --build . --parallel

How to install vLLM:

cd vllm
export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cpu"
export VLLM_BUILD_CPU_ONLY="1"
export VLLM_BUILD_CPU_OPS="1"
pip install -e .

And run sample:

VLLM_OPENVINO=1 python3 examples/offline_inference.py

If you want to use models with PagedAttention converted from optimum-intel, use:

VLLM_OPENVINO_OPTIMUM=1 python3 examples/offline_inference.py