pytorch-labs / tritonbench

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
BSD 3-Clause "New" or "Revised" License
20 stars 3 forks source link

Format benchmark function names and change x_val to corresponding input shapes #35

Closed FindHao closed 3 weeks ago

FindHao commented 3 weeks ago

Fix https://github.com/pytorch-labs/tritonbench/issues/31 Test Plan:

% python run.py --op fused_linear_cross_entropy --num-inputs 1 --metrics latency
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:13<00:00, 13.02s/it]
    (B*T, H)    torch_lm_head_ce-latency    liger_lm_head_ce-latency    inductor_fused_linear_cross_entropy-latency
------------  --------------------------  --------------------------  ---------------------------------------------
(4096, 4096)                     145.728                     526.446                                        144.567
facebook-github-bot commented 3 weeks ago

@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

FindHao commented 3 weeks ago

LGTM, it is surprising the liger is so much slower than the baseline, should we report this number to liger repo?

oh. For this kernel, it is expected since it is for memory usage optimization.

facebook-github-bot commented 3 weeks ago

@FindHao merged this pull request in pytorch-labs/tritonbench@dcefed3a7bfacb7564334a063b5b81444b0815db.