Closed ilya-lavrenov closed 5 months ago
Introduced initial version of PagedAttention
PagedAttention
cmake -DCUSTOM_OPERATIONS="paged_attention" <modules/custom_operations> cmake --build . --parallel
How to install vLLM:
cd vllm export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cpu" export VLLM_BUILD_CPU_ONLY="1" export VLLM_BUILD_CPU_OPS="1" pip install -e .
And run sample:
VLLM_OPENVINO=1 python3 examples/offline_inference.py
If you want to use models with PagedAttention converted from optimum-intel, use:
VLLM_OPENVINO_OPTIMUM=1 python3 examples/offline_inference.py
Introduced initial version of
PagedAttention
How to install vLLM:
And run sample:
If you want to use models with PagedAttention converted from optimum-intel, use: