Closed varun-sundar-rabindranath closed 3 months ago
They Y-label in the --plot-metric pct_cuda_time
graph appears to be wrong above, but in the code it seems to be getting set correctly? I assume this has been fixed?
They Y-label in the --plot-metric pct_cuda_time graph appears to be wrong above, but in the code it seems to be getting set correctly? I assume this has been fixed?
Hey Lucas. Yes, I noticed that and fixed it. Sorry, should have mentioned it somewhere.
Migrated all changes including all of the layer-by-layer profiling code to https://github.com/neuralmagic/vllm/pull/3
Update visualize trace utility.
ignore_sampler
arg, and instead add afold_json_node
arg - This argument collapses the specified JSON tree so the plot has less clutter.Usage:
python3 neuralmagic/tools/profiler/visualize_trace.py --json-trace profiler_fp8_trace.json --output-directory ./kernel --level kernel
This command produce 2 output files :kernel/prefill.png
andkernel/decode_steps.png
which are stacked-bar graph plots. In these plots the operations are grouped together by high-level concepts such asgemms
,attention
,rms-norm
etc.python3 neuralmagic/tools/profiler/visualize_trace.py --json-trace profiler_fp8_trace.json --output-directory ./module --level module --plot-metric pct_cuda_time
This command also produces 2 output files :module/prefill.png
andmodule/decode_steps.png
which are stacked-bar graph plots. In these plots the bars sum up to a 100 as the requested plot metric ispct_cuda_time