vllm can‘t use oneCCL on host

biyuehuang commented 1 month ago

Ubuntu22.04,kernel 5.15.0 with 4 * Arc A770 on Xeon(R) w9-3495X $ clinfo Driver Version 24.22.29735.27

script

source /opt/intel/oneapi/2024.0/oneapi-vars.sh --force
source /opt/intel/1ccl-wks/setvars.sh --force  # use oneCCL

export MODEL="/opt/Meta-Llama-3-8B-Instruct"

export CCL_WORKER_COUNT=2 ## 2 maybe means 2*A770
export FI_PROVIDER=shm
export CCL_ATL_TRANSPORT=ofi
export CCL_ZE_IPC_EXCHANGE=sockets
export CCL_ATL_SHM=1
export SYCL_CACHE_PERSISTENT=1
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
#export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so:${LD_PRELOAD}
export ZE_AFFINITY_MASK=0,1

for n in $(seq 8 2 20); do
    echo "Model= $MODEL RATE= 0.7 N= $n..."
    python3 ./benchmark_throughput.py \
        --backend vllm \
        --dataset ./ShareGPT_V3_unfiltered_cleaned_split.json \
        --model $MODEL \
        --num-prompts 100 \
        --seed 42 \
        --trust-remote-code \
        --enforce-eager \
        --dtype float16 \
        --device xpu \
        --load-in-low-bit sym_int4 \
        --gpu-memory-utilization 0.7 \
        --max-num-seqs $n \
        --tensor-parallel-size 2  ## 2 means 2*A770
done
sleep 10
exit 0

(ipex-vllm) test@adc-a770:~$ pip list
Package                       Version               Editable project location
----------------------------- --------------------- -------------------------
accelerate                    0.23.0
aiohttp                       3.9.5
aiosignal                     1.3.1
annotated-types               0.7.0
antlr4-python3-runtime        4.9.3
anyio                         4.4.0
attrs                         23.2.0
bigdl-core-xe-21              2.5.0b20240805
bigdl-core-xe-addons-21       2.5.0b20240805
bigdl-core-xe-batch-21        2.5.0b20240805
certifi                       2024.7.4
charset-normalizer            3.3.2
click                         8.1.7
cloudpickle                   3.0.0
cmake                         3.30.0
deepspeed                     0.14.1+ed8aed57
diskcache                     5.6.3
dnspython                     2.6.1
einops                        0.8.0
email_validator               2.2.0
fastapi                       0.111.1
fastapi-cli                   0.0.4
filelock                      3.15.4
frozenlist                    1.4.1
fsspec                        2024.6.1
h11                           0.14.0
hjson                         3.1.0
httpcore                      1.0.5
httptools                     0.6.1
httpx                         0.27.0
huggingface-hub               0.24.0
idna                          3.7
intel-cmplr-lib-ur            2024.2.0
intel_extension_for_deepspeed 0.9.4+0eb734b
intel-extension-for-pytorch   2.1.10+xpu
intel-openmp                  2024.2.0
interegular                   0.3.3
ipex-llm                      2.1.0b20240805
Jinja2                        3.1.4
joblib                        1.4.2
jsonschema                    4.23.0
jsonschema-specifications     2023.12.1
lark                          1.1.9
llvmlite                      0.43.0
markdown-it-py                3.0.0
MarkupSafe                    2.1.5
mdurl                         0.1.2
mkl                           2024.0.0
mpi4py                        3.1.6
mpmath                        1.3.0
msgpack                       1.0.8
multidict                     6.0.5
nest-asyncio                  1.6.0
networkx                      3.3
ninja                         1.11.1.1
Nuitka                        2.4.4
numba                         0.60.0
numpy                         1.26.4
omegaconf                     2.3.0
oneccl-bind-pt                2.1.300+xpu
ordered-set                   4.1.0
outlines                      0.0.34
packaging                     24.1
pandas                        2.2.2
pillow                        10.4.0
pip                           24.0
prometheus_client             0.20.0
protobuf                      5.27.2
psutil                        6.0.0
py-cpuinfo                    9.0.0
pyarrow                       17.0.0
pydantic                      2.8.2
pydantic_core                 2.20.1
Pygments                      2.18.0
pynvml                        11.5.0
python-dateutil               2.9.0.post0
python-dotenv                 1.0.1
python-multipart              0.0.9
pytz                          2024.1
PyYAML                        6.0.1
ray                           2.32.0
referencing                   0.35.1
regex                         2024.5.15
requests                      2.32.3
rich                          13.7.1
rpds-py                       0.19.0
safetensors                   0.4.3
scipy                         1.14.0
sentencepiece                 0.2.0
setuptools                    69.5.1
shellingham                   1.5.4
six                           1.16.0
sniffio                       1.3.1
starlette                     0.37.2
sympy                         1.13.1
tabulate                      0.9.0
tbb                           2021.13.0
tiktoken                      0.7.0
tokenizers                    0.15.2
torch                         2.1.0a0+cxx11.abi
torchaudio                    2.1.0.post2+cxx11.abi
torchvision                   0.16.0a0+cxx11.abi
tqdm                          4.66.4
transformers                  4.38.2
transformers-stream-generator 0.0.5
triton                        2.1.0
typer                         0.12.3
typing_extensions             4.12.2
tzdata                        2024.1
urllib3                       2.2.2
uvicorn                       0.30.3
uvloop                        0.19.0
vllm                          0.3.3+xpu0.0.1        /opt/WD/Code/vllm-zoo
watchfiles                    0.22.0
websockets                    12.0
wheel                         0.43.0
xformers                      0.0.27
yarl                          1.9.4
zstandard                     0.23.0

sudo xpu-smi dump -m 1,2,18,22,26,31,34

kevin-t-tang commented 1 month ago

oneAPI: l_BaseKit_p_2024.0.1.46_offline.sh

conda env: ipex-vllm https://github.com/intel-analytics/ipex-llm/blob/66fe2ee46465306e241296b2d3440f6ba31b7305/docs/mddocs/Quickstart/vLLM_quickstart.md

glorysdj commented 1 month ago

It's an known issue. User has successfully run IPEX-LLM vLLM in Docker.

moutainriver commented 1 month ago

I'd like to deep a bit for this issue from CCG. I can take this JIRA offline.

intel-analytics / ipex-llm

vllm can‘t use oneCCL on host #11743