intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs
MIT License
143 stars 44 forks source link

Slow core tests with PyTorch and Agama Rolling #1627

Closed pbchekin closed 3 months ago

pbchekin commented 4 months ago

It takes 44 minutes to run core tests with PyTorch and Agama rolling. It is much slower than:

Also there 186 failed tests vs. 10 with PyTorch and Agama LTS.

https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/9945550801/job/27474062109

etiotto commented 4 months ago

I suspect the reason for the slowdown in running tests is that PyTorch doesn't pass to Triton the flags that indicate the presence of certain HW features (DPAS available, 2D block reads/writes available). When those flags aren't pass the Triton compiler will lower tl.dot operations to scalar loops instead of using the XMX engine.

The solution IMO is to get PyTorch to pass the flags like IPEX does.

vlad-penkin commented 3 months ago

As per @etiotto the issue root cause is:

anmyachev commented 3 months ago

The current time for PyTorch and Agama rolling is around ~16 min: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10292166320/job/28485982261. @pbchekin is it enough to close this issue?

pbchekin commented 3 months ago

The current time for PyTorch and Agama rolling is around ~16 min: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10292166320/job/28485982261. @pbchekin is it enough to close this issue?

Yep, much better, closing.