Closed victor-eds closed 1 week ago
Allow multiple warps in non-sliced dimension as long as there are n*sub_group_size contiguous elements per warp in the non-sliced dimension.
n*sub_group_size
Second step for https://github.com/intel/intel-xpu-backend-for-triton/issues/2562. Need to evaluate whether this is enough work on the pass or we need to extend it.
Allow multiple warps in non-sliced dimension as long as there are
n*sub_group_size
contiguous elements per warp in the non-sliced dimension.