pytorch-labs / tritonbench

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
BSD 3-Clause "New" or "Revised" License
20 stars 3 forks source link

Add custom ops fused_linear_cross_entropy,geglu,cross_entropy #21

Closed FindHao closed 3 weeks ago

FindHao commented 3 weeks ago

Cloned from PR https://github.com/pytorch-labs/tritonbench/pull/13 because of merging bot issue Migrated from https://github.com/pytorch/benchmark/pull/2507

Add custom ops fused_linear_cross_entropy,geglu,cross_entropy from liger kernel.

Test Plan:

% python run.py --op fused_linear_cross_entropy,geglu,cross_entropy --num-inputs 1
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:12<00:00, 12.67s/it]
  x_val    LMHeadCE-latency    LigerLMHeadCE-latency    inductor_fused_linear_cross_entropy-latency
-------  ------------------  -----------------------  ---------------------------------------------
      0             139.673                  533.048                                        143.575
  0%|                                                                                                                                                          | 0/1 [00:00<?, ?it/s]/scratch/yhao/pta/pytorch/torch/_inductor/compile_fx.py:182: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
  warnings.warn(
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:07<00:00,  7.65s/it]
  x_val    LlamaMLP-latency    LigerGEGLUMLP-latency    InductorLlamaMLP-latency
-------  ------------------  -----------------------  --------------------------
      0             69.2102                  69.6707                     69.0965
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.54s/it]
  x_val    CrossEntropyLoss-latency    LigerCrossEntropyLoss-latency    InductorCrossEntropyLoss-latency
-------  --------------------------  -------------------------------  ----------------------------------
      0                        0.84                          0.40736                            0.180896
facebook-github-bot commented 3 weeks ago

@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 3 weeks ago

@FindHao merged this pull request in pytorch-labs/tritonbench@72ddac91274e783ecdf5a12b30747e92d2f95b7b.