Open Stonepia opened 1 week ago
I would like to explain my concern as to why we don't set the grf_mode=auto
in the triton config from the inductor side.
The reason is that, from the PyTorch inductor side, we are currently trying to keep the same config with CUDA/HIP, so that we would avoid possible unalignments. Large GRF mode for compiling kernels is an optimization for XPU only, thus we would like to hide the complexity from the users familiar with CUDA.
BTW, I am not quite familiar with the differences between different grf_mode
s. So if there are any concerns, please point out and let's have a discussion.
PR https://github.com/intel/intel-xpu-backend-for-triton/pull/1654 has been introduced using large GRF mode automatically.
Could we make the
cout
in these lines be triggered by a debug-only flag so that normal users could safely ignore this? Those should be treated as warnings in my opinion.https://github.com/intel/intel-xpu-backend-for-triton/blob/614efe26adeac8e28fe27c6bbfa7840bbc43ec90/third_party/intel/backend/driver.c#L188-L201