Add peak memory usage and footprint measurement

FindHao commented 3 weeks ago

Fix https://github.com/pytorch-labs/tritonbench/issues/28 Add gpu_peak_mem,mem_footprint,cpu_peak_mem to --metrics.

Test Plan:

% python run.py --op fused_linear_cross_entropy --num-inputs 4 --metrics latency,gpu_peak_mem,mem_footprint,cpu_peak_mem
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [03:13<00:00, 48.38s/it]
  x_val    LMHeadCE-gpu_peak_mem    LMHeadCE-cpu_peak_mem    LMHeadCE-latency    LigerLMHeadCE-gpu_peak_mem    LigerLMHeadCE-mem_footprint    LigerLMHeadCE-cpu_peak_mem    LigerLMHeadCE-latency    inductor_fused_linear_cross_entropy-gpu_peak_mem    inductor_fused_linear_cross_entropy-mem_footprint    inductor_fused_linear_cross_entropy-cpu_peak_mem    inductor_fused_linear_cross_entropy-latency
-------  -----------------------  -----------------------  ------------------  ----------------------------  -----------------------------  ----------------------------  -----------------------  --------------------------------------------------  ---------------------------------------------------  --------------------------------------------------  ---------------------------------------------
      0              8.506082304                  85.0289             145.928                       6.66886                        1.27549                       93.7701                  537.321                                             6.40477                                              1.32809                                             96.5104                                        166.69
      1             12.775916544                  96.5116             285.919                       7.00013                        1.8251                        96.5116                 1007.9                                               8.57329                                              1.4902                                              96.5722                                        290.965
      2             21.315585024                  96.5722             579.465                       7.66267                        2.78175                       97.0613                 6807.01                                             12.9103                                               1.65105                                             97.1784                                        583.152
      3                 CUDA OOM                                                                    8.98984                                                      97.6245                 3416.81                                             21.5844                                                                                                   97.7367                                       1219.02

facebook-github-bot commented 3 weeks ago

@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 3 weeks ago

@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 3 weeks ago

@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 3 weeks ago

@FindHao merged this pull request in pytorch-labs/tritonbench@4cd607b0db19ce5f5070955194a29451f2e9ad00.

pytorch-labs / tritonbench

Add peak memory usage and footprint measurement #34