Closed FindHao closed 3 weeks ago
Add layernorm from liger kernel and fix bug for embedding bwd. Disable liger kernels in internal ci.
Test Plan:
python run.py --op layer_norm --num-inputs 4 --metrics latency 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:15<00:00, 3.81s/it] x_val torch_layer_norm-latency triton_layer_norm-latency torch_compile_layer_norm-latency liger_layer_norm-latency ------- -------------------------- --------------------------- ---------------------------------- -------------------------- 1024 0.028896 0.024512 0.02448 0.023808 1536 0.038688 0.034144 0.05584 0.033536 2048 0.048704 0.043424 0.059424 0.043104 2560 0.058112 0.05472 0.083712 0.054176
@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@FindHao merged this pull request in pytorch-labs/tritonbench@66a7cc96eff83ea98e027cda7683e08b0cb7c437.
Add layernorm from liger kernel and fix bug for embedding bwd. Disable liger kernels in internal ci.
Test Plan: