PagedAttention operation

Introduced initial version of PagedAttention

cmake -DCUSTOM_OPERATIONS="paged_attention" <modules/custom_operations>
cmake --build . --parallel

vLLM fork to use https://github.com/slyalin/vllm/tree/openvino
OpenVINO branch to use https://github.com/slyalin/openvino/tree/pytorch_module_extension

How to install vLLM:

cd vllm
export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cpu"
export VLLM_BUILD_CPU_ONLY="1"
export VLLM_BUILD_CPU_OPS="1"
pip install -e .

And run sample:

VLLM_OPENVINO=1 python3 examples/offline_inference.py

If you want to use models with PagedAttention converted from optimum-intel, use:

VLLM_OPENVINO_OPTIMUM=1 python3 examples/offline_inference.py

openvinotoolkit / openvino_contrib

PagedAttention operation #867