intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs
MIT License
124 stars 35 forks source link

[Benchmarks][Upstream PyTorch 2.5] `Triton` and `XeTLA` softmax performance degrades in comparison with `torch 2.1` / `ipex 2.1` test proxies #2106

Open ESI-SYD opened 1 week ago

ESI-SYD commented 1 week ago
  1. Ratio of Triton/ XeTLA keep same except for attention caused by XeTLA attention absolute number degraded
  2. Both Triton and XeTLA softmax cases degraded, so Triton/ XeTLA not changed.

details: https://github.com/intel/intel-xpu-backend-for-triton/pull/1905#issuecomment-2320701513

vlad-penkin commented 1 week ago

@ESI-SYD what is the root cause for this issue? can you pin point it to a particular torch operation?

@anmyachev to proceed further with analysis / triaging please create a minimal reproducer for the Triton kernel path.

ESI-SYD commented 1 week ago

@ESI-SYD what is the root cause for this issue? can you pin point it to a particular torch operation?

There are two main differences in benchmark time method change after applying the Draft

  1. No sync submitting. https://github.com/intel/intel-xpu-backend-for-triton/blob/llvm-target/python/triton/testing.py#L214

  2. Use the time stamp between two barriers which is not accurate. Previous detailed explanation by chengjun.

anmyachev commented 6 days ago

https://github.com/intel/intel-xpu-backend-for-triton/pull/2149#issuecomment-2337632244