2 GPU settings for Llama2_7b is not working, per XPU-SMI device 0 @ 99% and device 1 @ 0% during execution --resolved

intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Apache License 2.0

6.48k stars 1.24k forks source link

I followed the steps from this github link -- https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/Deepspeed-AutoTP/README.md and attempted to verify 2 GPU inference runs on these Token Combinations

1) Initial run using default script with sym-int4 and 32 tokens

Note: This run just used one GPU as the world_size is 1 here.

2) Also tried with sym int4 , sym int8 , fp8 , fp16 on token sizes 2048x128 , 2048x 256. The general observation using XPU-SMI is shown below,

Notes: 1) Top is xpu-smi dump on device 0 and bottom portion is for xpu-smi dump on device 1 2) Notice device 0 is showing 99% GPU utilization vs device 1 showing close to 0% utilization

Kindly advise if there is any intermediate step to apply to achieve expected 2 GPU processes.

Please note that I am running the inference on this system -->NF5468-M6 with 8x Intel Flex GPU 170 Full system spec details are available here -> https://wiki.ith.intel.com/display/MediaWiki/Flex-170x8+%28Inspur+-+ICX%29+Qualification

intel-analytics / ipex-llm

2 GPU settings for Llama2_7b is not working, per XPU-SMI device 0 @ 99% and device 1 @ 0% during execution --resolved #10538