Closed FindHao closed 2 weeks ago
Let's differentiate this with the current
tflops()
metric, the ncu report is the hardware flops where the current tflops is analytic flops (calculated from math).
yeah, for sure. how about --metrics hardware_tflops ?
how about we use a shorter name ncu_tflops
?
how about we use a shorter name
ncu_tflops
?
sure. will add this feature later.
@xuzhao9 What about using Triton's Proton profiler metadata metric scope instead? see Triton and cuBLAS matmul kernel
@antferdom Yes we plan to support the Proton profiler. However, the flops number defined are "analytic flops" and it is different from the "hardware flops" from NCU. Tritonbench relies on each operator author to add analytic flops, e.g., adding the tflops()
function with @register_metric()
@xuzhao9 What about using Triton's Proton profiler metadata metric scope instead? see Triton and cuBLAS matmul kernel
We had some discussions here https://github.com/pytorch/pytorch/pull/136169. I think we are open to do in proton way too if anyone wants to help. I've got a ncu version locally and will push it later.
@xuzhao9 Thanks for the clarification about the target FLOPs number, “analytic flops” (e.g. user defined formula like in Proton) vs NCU GPU hardware counters for precise flops counting. I’m also currently using ncu for automatically profiling Torch Inductor Triton GPU kernels, based on Torch official documentation. @register_metric()
is similar to Proton’s metric
in scope and metadata_fn
@FindHao I have a simple prototype for automatically annotating using Proton scope
contextmanager, as discussed in the issue. Before going further with Proton, I will wait for your ncu runner exemple.
@xuzhao9 Thanks for the clarification about the target FLOPs number, “analytic flops” (e.g. user defined formula like in Proton) vs NCU GPU hardware counters for precise flops counting. I’m also currently using ncu for automatically profiling Torch Inductor Triton GPU kernels, based on Torch official documentation.
@register_metric()
is similar to Proton’smetric
in scope andmetadata_fn
@FindHao I have a simple prototype for automatically annotating using Proton
scope
contextmanager, as discussed in the issue. Before going further with Proton, I will wait for your ncu runner exemple.
The key part to obtain flops is done here https://github.com/pytorch-labs/tritonbench/blob/main/tritonbench/components/ncu/analyzer.py#L86 . The remaining parts are aggregation and add metrics to results. will do it later.
Let's differentiate this with the current
tflops()
metric, the ncu report is the hardware flops where the current tflops is analytic flops (calculated from math).