Open Shoreshen opened 3 months ago
\mathcal{L}(T)[i_d] = L[(i_d + k_d*T.shape[d]) % L.shape[d]] \forall k_d such as i_d + k_d*T.shape[d] < L.shape[d]
Yes, the formula is not very accurate. Maybe it can be fixed by
forall k_d such as k_d = 0 or i_d + k_d*T.shape[d] < L.shape[d]
In the definition of
class DistributedEncoding
in filenclude/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td
there is a illustration comments:Based on my understanding:
rank
here refers to the number of tensor and CTA dimensions (L dimension??), then it requires that the dimension of tensor has to be smaller or equal to the dimension of CTAT[0,5]
, We haveT.shape[0]=2
,T.shape[1]=8
,L.shape[0]=4
,L.shape[1]=4
thenk_0=0
: we have0 + 0 * 2 = 0 < 4
, acceptk_0=1
: we have0 + 1 * 2 = 2 < 4
, acceptk_0=2
: we have0 + 2 * 2 = 4 = 4
, reject and all furtherk_1=0
: we have5 + 0 * 2 = 5 > 4
, reject and all furtherT[0,5]