Slow core tests with PyTorch and Agama Rolling

intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs

MIT License

143 stars 44 forks source link

Slow core tests with PyTorch and Agama Rolling #1627

Closed pbchekin closed 3 months ago

pbchekin commented 4 months ago

It takes 44 minutes to run core tests with PyTorch and Agama rolling. It is much slower than:

13 minutes with IPEX and Agama Rolling
15 minutes with IPEX and Agama LTS
23 minutes with PyTorch and Agama LTS

Also there 186 failed tests vs. 10 with PyTorch and Agama LTS.

https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/9945550801/job/27474062109

etiotto commented 4 months ago

I suspect the reason for the slowdown in running tests is that PyTorch doesn't pass to Triton the flags that indicate the presence of certain HW features (DPAS available, 2D block reads/writes available). When those flags aren't pass the Triton compiler will lower tl.dot operations to scalar loops instead of using the XMX engine.

The solution IMO is to get PyTorch to pass the flags like IPEX does.

vlad-penkin commented 3 months ago

As per @etiotto the issue root cause is:

1576

anmyachev commented 3 months ago

The current time for PyTorch and Agama rolling is around ~16 min: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10292166320/job/28485982261. @pbchekin is it enough to close this issue?

pbchekin commented 3 months ago

The current time for PyTorch and Agama rolling is around ~16 min: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10292166320/job/28485982261. @pbchekin is it enough to close this issue?

Yep, much better, closing.

intel / intel-xpu-backend-for-triton

Slow core tests with PyTorch and Agama Rolling #1627

1576