Open FindHao opened 1 day ago
@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
Some results are not correct. working on the fix.
@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
This PR add a nsys report analyzer providing metrics
nsys_gpu_kernel_sum
is the sum of total GPU kernel execution time on GPUs, thensys_nvtx_range_duration
is the total execution time of the operator, and thensys_launch_overhead
is their difference which indicates the launch overhead. This is one way to measure execution time mentioned in https://github.com/pytorch-labs/tritonbench/issues/50Fix https://github.com/pytorch-labs/tritonbench/issues/67
Test Plan: