intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.26k stars 1.23k forks source link

How to profile IPEX-llm application to turn performance from top to bottom #11485

Open lucshi opened 4 days ago

lucshi commented 4 days ago

I'd like to use VTune to profiel IPEX-llm application focusing on GPU. e.g. the performance/all-in-one benchmark to get a full picture of bottleneck. My questions are:

  1. General guide to use VTune to profile IPEX application.
  2. Which OS shall I choose for most profiling details? Whether Windows have some limitation like "layers level profiling is not available".
  3. How to map the source code with the possible profiling bottleneck result.
leonardozcm commented 4 days ago

hi @lucshi

I'd like to use VTune to profiel IPEX application focusing on GPU. e.g. the performance/all-in-one benchmark to get a full picture of bottleneck. My questions are:

  1. General guide to use VTune to profile IPEX application.

You may refer to this doc https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2023-0/gpu-analysis-with-vtunetm-profiler.html and their cookbook of gpu https://www.intel.com/content/www/us/en/docs/vtune-profiler/cookbook/2024-2/profiling-dpc-application.html

  1. Which OS shall I choose for most profiling details? Whether Windows have some limitation like "layers level profiling is not available".

Both linux and window are ok. If you want a "layers level profiling", how about a torch profiler?

  1. How to map the source code with the possible profiling bottleneck result.

In theory, you can refer to this to do it. You can give it a try. https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2023-0/viewing-source.html