Open LiyangLingIntel opened 4 months ago
Refer to benchmark/scripts to extract kernels from PyTorch E2E models. The following task is to analysis the extracted kernels, set scope of the microbenchmark cases.
We need to provide a powerful compare method to confirm Triton's performance is good. like: E2E/kernels compare to PVC + IPEX/NV platform,
We need to provide a powerful compare method to confirm Triton's performance is good. like: E2E/kernels compare to PVC + IPEX/NV platform,
Yes. Given the fused Triton kernels from PyTorch E2E models, to setup the benchmark, we should decide
When above points are resolved, maybe we can consider add this non-gemm benchmark to https://github.com/intel/intel-xpu-backend-for-triton/issues/879 and tracked by CI or nightly.
This microbenchmark purpose is to check performance regressions.
Split this ticket into the two: base (existing kernels - softmax) and additions (generalized / standalone reduction and atomic add kernels).
Split this ticket into the two: base (existing kernels - softmax) and additions (generalized / standalone reduction and atomic add kernels).
2 issues are filed to track these 2 step:
Propose to set the issue as an umbrella issue, will create some sub-issue to track details.
We need a Microbenchmark to check the performance regularly, guarantee there is no huge regression after some changes. Currently we already have 130+ triton none-gemm kernels extracted from pytorch E2E models: https://github.com/intel/intel-xpu-backend-for-triton/tree/liyang/micro-benchmark/benchmark/inductor_kernels
There are several points that need to be decided: