How to profile IPEX-llm application to turn performance from top to bottom

intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc

Apache License 2.0

6.7k stars 1.26k forks source link

hi @lucshi

I'd like to use VTune to profiel IPEX application focusing on GPU. e.g. the performance/all-in-one benchmark to get a full picture of bottleneck. My questions are:

General guide to use VTune to profile IPEX application.

You may refer to this doc https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2023-0/gpu-analysis-with-vtunetm-profiler.html and their cookbook of gpu https://www.intel.com/content/www/us/en/docs/vtune-profiler/cookbook/2024-2/profiling-dpc-application.html

Which OS shall I choose for most profiling details? Whether Windows have some limitation like "layers level profiling is not available".

Both linux and window are ok. If you want a "layers level profiling", how about a torch profiler?

How to map the source code with the possible profiling bottleneck result.

In theory, you can refer to this to do it. You can give it a try. https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2023-0/viewing-source.html

intel-analytics / ipex-llm

How to profile IPEX-llm application to turn performance from top to bottom #11485