intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.49k stars 1.24k forks source link

ONEDNN_VERBOSE does not work #10146

Open HLneoh opened 7 months ago

HLneoh commented 7 months ago

Is there a method to trace the execution of oneDNN API calls while executing inference with BigDL-LLM?

vpirogov commented 7 months ago

oneDNN has verbose mode exactly for this purpose. Unless it's disabled at build time you can run your inference workload with export DNNL_VERBOSE=1 to get the trace.

HLneoh commented 7 months ago

Yes, I have executed the inference using DNNL_VERBOSE=1 python inference.py but did not observe any trace of execution. I am wonder whether BigDL uses a different method to get the trace.

Zephyr596 commented 7 months ago

I have executed the inference and observe the trace of execution in Intel GPU. First you should ensure you have prepared your machine following this guide I used chatglm3 to inference . When I activate conda env, I run these commands: Enable OneAPI and other settings for GPU support

source bigdl-llm-init -g

But in default, ONEDNN_VERBOSE has not been set by bigdl-llm-init, so I set it manually.

export ONEDNN_VERBOSE=all # https://oneapi-src.github.io/oneDNN/dev_guide_verbose.html#:~:text=enables%20all%20above%20flags

Then I run inference

python ./generate.py --repo-id-or-model-path /mnt/disk1/models/chatglm3-6b

The trace of execution:

2024-02-19 13:27:06,588 - root - INFO - intel_extension_for_pytorch auto imported
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:17<00:00,  2.47s/it]
2024-02-19 13:27:23,943 - bigdl.llm.transformers.utils - INFO - Converting the current model to sym_int4 format......
onednn_verbose,info,oneDNN v3.3.0 (commit 887fb044ccd6308ed1780a3863c2c6f5772c94b3)
onednn_verbose,info,cpu,runtime:threadpool,nthr:16
onednn_verbose,info,cpu,isa:Intel AVX2 with Intel DL Boost
onednn_verbose,info,gpu,runtime:DPC++
onednn_verbose,info,gpu,engine,0,backend:Level Zero,name:Intel(R) Arc(TM) A770 Graphics,driver_version:1.3.27191,binary_kernels:enabled
onednn_verbose,info,graph,backend,0:dnnl_backend
onednn_verbose,info,experimental features are enabled
onednn_verbose,info,use batch_normalization stats one pass is enabled
onednn_verbose,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,graph,info,template:operation,engine,partition_id,partition_kind,op_names,data_formats,logical_tensors,fpmath_mode,backend,exec_time
onednn_verbose,info,gpu,gemm,consider:32x32,4x8x1,score:22411504.572694
onednn_verbose,info,gpu,gemm,consider:32x32,4x8x1,score:22411504.572694
onednn_verbose,info,gpu,gemm,consider:16x16,8x4x1,score:9911940.480222
onednn_verbose,info,gpu,gemm,consider:32x8,2x8x2,score:6450092.134270
onednn_verbose,info,gpu,gemm,consider:32x16,2x4x8,score:8817306.292282
onednn_verbose,info,gpu,gemm,consider:16x8,2x4x4,score:4109264.110664
onednn_verbose,info,gpu,gemm,consider:16x4,2x8x4,score:3994569.622959
onednn_verbose,info,gpu,gemm,consider:16x8,4x8x1,score:6587470.454232
onednn_verbose,info,gpu,gemm,consider:16x4,2x8x1,score:5483570.658622
onednn_verbose,info,gpu,gemm,consider:16x1,4x1x1,score:9841449.488000
onednn_verbose,info,gpu,gemm,consider:8x1,4x1x4,score:14258794.173626
onednn_verbose,info,gpu,gemm,consider:8x1,2x1x16,score:31887704.135040
onednn_verbose,info,gpu,gemm,consider:32x32,4x8x1,score:22411504.572694
onednn_verbose,info,gpu,gemm,consider:32x32,4x8x1,score:22411504.572694
onednn_verbose,info,gpu,gemm,consider:16x16,8x4x1,score:9911940.480222
onednn_verbose,info,gpu,gemm,consider:32x8,2x8x2,score:6450092.134270
onednn_verbose,info,gpu,gemm,consider:32x16,2x4x8,score:8817306.292282
onednn_verbose,info,gpu,gemm,consider:16x8,2x4x4,score:4109264.110664
onednn_verbose,info,gpu,gemm,consider:16x4,2x8x4,score:3994569.622959
onednn_verbose,info,gpu,gemm,consider:16x8,4x8x1,score:6587470.454232
onednn_verbose,info,gpu,gemm,consider:16x4,2x8x1,score:5483570.658622
onednn_verbose,info,gpu,gemm,consider:16x1,4x1x1,score:9841449.488000
onednn_verbose,info,gpu,gemm,consider:8x1,4x1x4,score:14258794.173626
onednn_verbose,info,gpu,gemm,consider:8x1,2x1x16,score:31887704.135040
onednn_verbose,info,gpu,E gemm SSS TNN 16 4 aB4x2 aB4x2 aB wg 2x8x4 kr kc4 cab4 ks8 nse di bo bk0 sm sn sr dm
onednn_verbose,primitive,create_nested:cache_miss,gpu,gemm,jit:gemm:any,undef,src_a_f32::blocked:bac::f0 src_b_f32::blocked:acb::f0 dst_f32::blocked:abc::f0,,,32x18x128:32x128x18,18.0811
onednn_verbose,primitive,create:cache_miss,gpu,matmul,jit:gemm:any,undef,src_f32::blocked:bac::f0 wei_f32::blocked:acb::f0 dst_f32::blocked:abc::f0,attr-scratchpad:user ,,32x18x128:32x128x18,21.8518
onednn_verbose,primitive,exec:check,primitive,unused primitive execution argument (80),src/common/primitive_exec_types.cpp:74
onednn_verbose,primitive,exec,gpu,matmul,jit:gemm:any,undef,src_f32::blocked:bac::f0 wei_f32::blocked:acb::f0 dst_f32::blocked:abc::f0,attr-scratchpad:user ,,32x18x128:32x128x18,0.292969
onednn_verbose,info,gpu,gemm,consider:16x16,8x4x1,score:3680416.698880
onednn_verbose,info,gpu,gemm,consider:32x8,4x4x2,score:3700671.028160
onednn_verbose,info,gpu,gemm,consider:16x8,4x4x4,score:4982393.981202
onednn_verbose,info,gpu,gemm,consider:16x8,2x4x4,score:4863343.393024
onednn_verbose,info,gpu,gemm,consider:16x4,2x8x4,score:7501812.513331
onednn_verbose,info,gpu,gemm,consider:16x8,8x4x1,score:3617596.905280
onednn_verbose,info,gpu,gemm,consider:16x8,4x8x1,score:3996614.456960
onednn_verbose,info,gpu,gemm,consider:16x8,4x4x1,score:3666412.015360
onednn_verbose,info,gpu,gemm,consider:16x4,2x8x1,score:3797270.341120
onednn_verbose,info,gpu,gemm,consider:32x1,4x1x1,score:3854629.275264
onednn_verbose,info,gpu,gemm,consider:16x1,4x1x8,score:21370218.623360
onednn_verbose,info,gpu,gemm,consider:8x1,2x1x16,score:89634940.039680
onednn_verbose,info,gpu,gemm,consider:16x16,8x4x1,score:3680416.698880
onednn_verbose,info,gpu,gemm,consider:32x8,4x4x2,score:3700671.028160
onednn_verbose,info,gpu,gemm,consider:16x8,4x4x4,score:4982393.981202
onednn_verbose,info,gpu,gemm,consider:16x8,2x4x4,score:4863343.393024
onednn_verbose,info,gpu,gemm,consider:16x4,2x8x4,score:7501812.513331
onednn_verbose,info,gpu,gemm,consider:16x8,8x4x1,score:3617596.905280
onednn_verbose,info,gpu,gemm,consider:16x8,4x8x1,score:3996614.456960
onednn_verbose,info,gpu,gemm,consider:16x8,4x4x1,score:3666412.015360
onednn_verbose,info,gpu,gemm,consider:16x4,2x8x1,score:3797270.341120
onednn_verbose,info,gpu,gemm,consider:32x1,4x1x1,score:3854629.275264
onednn_verbose,info,gpu,gemm,consider:16x1,4x1x8,score:21370218.623360
onednn_verbose,info,gpu,gemm,consider:8x1,2x1x16,score:89634940.039680
...