intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.32k stars 1.23k forks source link

[Flex170/Whisper]Low GPU usage when running whisper via ipex-llm on Flex170 GPU #11468

Open Yanli2190 opened 4 weeks ago

Yanli2190 commented 4 weeks ago

Summary:

Steps:

  1. Install ipex-llm following below steps(20240629 is used): conda create -n ipex_llm python=3.9 source activate ipex_llm conda install -c conda-forge -y libstdcxx-ng=12 conda install -c conda-forge -y gperftools=2.10 jemalloc==5.2.1 pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install datasets soundfile librosa

  2. Run whisper via ipex-llm following run.sh(rename run.txt to run.sh, rename run_whipser_base_perf_dataset.txt to run_whipser_base_perf_dataset.py) run.txt run_whipser_base_perf_dataset.txt

  3. Monitor GPU Usage via xpu-smi

    image image
lzivan commented 3 weeks ago

Hi @Yanli2190 , we will try to reproduce your problem.

lzivan commented 3 weeks ago

Hi @Yanli2190 , we used your code and tried to reproduce on our Flex machine. The result is similar to yours which has a maximum utilization of 84%.

Yanli2190 commented 3 weeks ago

GPU Power is more accurate when using xpu-smi, Flex170 max power: 150w, GPU power got via xpu-smi is 60w when running whisper when running llama, GPU power got via xpu-smi is ~128w, almost max GPU SOC power, which means that GPU is fully utilized.