intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.72k stars 1.26k forks source link

Need modify Deepspeed test script to support Xeon CPU #11455

Open oldmikeyang opened 4 months ago

oldmikeyang commented 4 months ago

python/llm/example/GPU/Deepspeed-AutoTP/run_qwen_14b_arc_2_card.sh python/llm/example/GPU/Deepspeed-AutoTP/run_vicuna_33b_arc_2_card.sh python/llm/dev/benchmark/all-in-one/run-deepspeed-arc.sh

Current the following code only enable on the Intel Core CPU. But on Intel Xeon CPU, also need enable the SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS to improve performance.

if grep -q "Core" /proc/cpuinfo; then
    export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=2
fi
plusbang commented 4 months ago

According to our previous benchmark experiment, export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=0 performs better on Xeon CPU + multi GPUs. The default value of SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS is exactly 0, I think no need to modify script.