intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.47k stars 1.24k forks source link

vllm can‘t use oneCCL on host #11743

Open biyuehuang opened 1 month ago

biyuehuang commented 1 month ago

Ubuntu22.04,kernel 5.15.0 with 4 * Arc A770 on Xeon(R) w9-3495X $ clinfo Driver Version 24.22.29735.27

script

source /opt/intel/oneapi/2024.0/oneapi-vars.sh --force
source /opt/intel/1ccl-wks/setvars.sh --force  # use oneCCL

export MODEL="/opt/Meta-Llama-3-8B-Instruct"

export CCL_WORKER_COUNT=2 ## 2 maybe means 2*A770
export FI_PROVIDER=shm
export CCL_ATL_TRANSPORT=ofi
export CCL_ZE_IPC_EXCHANGE=sockets
export CCL_ATL_SHM=1
export SYCL_CACHE_PERSISTENT=1
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
#export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so:${LD_PRELOAD}
export ZE_AFFINITY_MASK=0,1

for n in $(seq 8 2 20); do
    echo "Model= $MODEL RATE= 0.7 N= $n..."
    python3 ./benchmark_throughput.py \
        --backend vllm \
        --dataset ./ShareGPT_V3_unfiltered_cleaned_split.json \
        --model $MODEL \
        --num-prompts 100 \
        --seed 42 \
        --trust-remote-code \
        --enforce-eager \
        --dtype float16 \
        --device xpu \
        --load-in-low-bit sym_int4 \
        --gpu-memory-utilization 0.7 \
        --max-num-seqs $n \
        --tensor-parallel-size 2  ## 2 means 2*A770
done
sleep 10
exit 0
(ipex-vllm) test@adc-a770:~$ pip list
Package                       Version               Editable project location
----------------------------- --------------------- -------------------------
accelerate                    0.23.0
aiohttp                       3.9.5
aiosignal                     1.3.1
annotated-types               0.7.0
antlr4-python3-runtime        4.9.3
anyio                         4.4.0
attrs                         23.2.0
bigdl-core-xe-21              2.5.0b20240805
bigdl-core-xe-addons-21       2.5.0b20240805
bigdl-core-xe-batch-21        2.5.0b20240805
certifi                       2024.7.4
charset-normalizer            3.3.2
click                         8.1.7
cloudpickle                   3.0.0
cmake                         3.30.0
deepspeed                     0.14.1+ed8aed57
diskcache                     5.6.3
dnspython                     2.6.1
einops                        0.8.0
email_validator               2.2.0
fastapi                       0.111.1
fastapi-cli                   0.0.4
filelock                      3.15.4
frozenlist                    1.4.1
fsspec                        2024.6.1
h11                           0.14.0
hjson                         3.1.0
httpcore                      1.0.5
httptools                     0.6.1
httpx                         0.27.0
huggingface-hub               0.24.0
idna                          3.7
intel-cmplr-lib-ur            2024.2.0
intel_extension_for_deepspeed 0.9.4+0eb734b
intel-extension-for-pytorch   2.1.10+xpu
intel-openmp                  2024.2.0
interegular                   0.3.3
ipex-llm                      2.1.0b20240805
Jinja2                        3.1.4
joblib                        1.4.2
jsonschema                    4.23.0
jsonschema-specifications     2023.12.1
lark                          1.1.9
llvmlite                      0.43.0
markdown-it-py                3.0.0
MarkupSafe                    2.1.5
mdurl                         0.1.2
mkl                           2024.0.0
mpi4py                        3.1.6
mpmath                        1.3.0
msgpack                       1.0.8
multidict                     6.0.5
nest-asyncio                  1.6.0
networkx                      3.3
ninja                         1.11.1.1
Nuitka                        2.4.4
numba                         0.60.0
numpy                         1.26.4
omegaconf                     2.3.0
oneccl-bind-pt                2.1.300+xpu
ordered-set                   4.1.0
outlines                      0.0.34
packaging                     24.1
pandas                        2.2.2
pillow                        10.4.0
pip                           24.0
prometheus_client             0.20.0
protobuf                      5.27.2
psutil                        6.0.0
py-cpuinfo                    9.0.0
pyarrow                       17.0.0
pydantic                      2.8.2
pydantic_core                 2.20.1
Pygments                      2.18.0
pynvml                        11.5.0
python-dateutil               2.9.0.post0
python-dotenv                 1.0.1
python-multipart              0.0.9
pytz                          2024.1
PyYAML                        6.0.1
ray                           2.32.0
referencing                   0.35.1
regex                         2024.5.15
requests                      2.32.3
rich                          13.7.1
rpds-py                       0.19.0
safetensors                   0.4.3
scipy                         1.14.0
sentencepiece                 0.2.0
setuptools                    69.5.1
shellingham                   1.5.4
six                           1.16.0
sniffio                       1.3.1
starlette                     0.37.2
sympy                         1.13.1
tabulate                      0.9.0
tbb                           2021.13.0
tiktoken                      0.7.0
tokenizers                    0.15.2
torch                         2.1.0a0+cxx11.abi
torchaudio                    2.1.0.post2+cxx11.abi
torchvision                   0.16.0a0+cxx11.abi
tqdm                          4.66.4
transformers                  4.38.2
transformers-stream-generator 0.0.5
triton                        2.1.0
typer                         0.12.3
typing_extensions             4.12.2
tzdata                        2024.1
urllib3                       2.2.2
uvicorn                       0.30.3
uvloop                        0.19.0
vllm                          0.3.3+xpu0.0.1        /opt/WD/Code/vllm-zoo
watchfiles                    0.22.0
websockets                    12.0
wheel                         0.43.0
xformers                      0.0.27
yarl                          1.9.4
zstandard                     0.23.0

image

sudo xpu-smi dump -m 1,2,18,22,26,31,34

image

kevin-t-tang commented 1 month ago

oneAPI: l_BaseKit_p_2024.0.1.46_offline.sh

conda env: ipex-vllm https://github.com/intel-analytics/ipex-llm/blob/66fe2ee46465306e241296b2d3440f6ba31b7305/docs/mddocs/Quickstart/vLLM_quickstart.md

glorysdj commented 1 month ago

It's an known issue. User has successfully run IPEX-LLM vLLM in Docker.

moutainriver commented 1 month ago

I'd like to deep a bit for this issue from CCG. I can take this JIRA offline.