Closed pbchekin closed 3 months ago
I suspect the reason for the slowdown in running tests is that PyTorch doesn't pass to Triton the flags that indicate the presence of certain HW features (DPAS available, 2D block reads/writes available). When those flags aren't pass the Triton compiler will lower tl.dot
operations to scalar loops instead of using the XMX engine.
The solution IMO is to get PyTorch to pass the flags like IPEX does.
As per @etiotto the issue root cause is:
The current time for PyTorch and Agama rolling is around ~16 min: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10292166320/job/28485982261. @pbchekin is it enough to close this issue?
The current time for PyTorch and Agama rolling is around ~16 min: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10292166320/job/28485982261. @pbchekin is it enough to close this issue?
Yep, much better, closing.
It takes 44 minutes to run core tests with PyTorch and Agama rolling. It is much slower than:
Also there 186 failed tests vs. 10 with PyTorch and Agama LTS.
https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/9945550801/job/27474062109