Open KeePLeArNinggG opened 2 years ago
Hi The following figure explains how this kernel arranges thread Idx dimension and block Idx dimension.
If (blockIdx.y<<5) is what confuses you, this is to count the horizontal offset of a threadblock. This is because we always make blockDim.x=32, and in this code you are looking at, coarsening factor = 1.
If the mixture of x and y in the same offset confuses you, it is because within a threadblock I bind horizontal axis to dim-x and vertical axis to dim-y, but among threadblocks (i.e. in a grid) I bind horizontal axis to dim-x and vertical to dim-y. The intra-threadblock mapping cannot be changed because we want a coalesced access to the dense matrix row. The inter-threablock mapping, however, may be changes without hurting correctness, but in my experiment it effects performance a little bit.
@hgyhungry : Are you assuming B is saved in transposed form ?
If yes, that would explain why X axis is mapped to vertical for you and Y axis is mapped horizontal. I bumped into same discrepancy as @KeePLeArNinggG . We are visualizing A (sparse matrix) and B(dense matrix) in non transposed state. Which is why we both are expecting :
line 105 : int cid = (blockIdx.x<<5)+threadIdx.x; line 106 : int rid = blockDim.y*blockIdx.y+threadIdx.y;
But if B is saved in transposed form then warp will be placed on vertical axis and thread block moves horizontally in B. Is this how you are visualizing it ?
I've been thinking about it(rid = blockDim.yblockIdx.x+threadIdx.y; cid = (blockIdx.y<<5)+threadIdx.x;)for a long time,But I still can't understand why this is. As I have learned,usually,rid = blockDim.xblockIdx.x+threadIdx.x; cid = blockIdx.y*blockDim.y+threadIdx.y; I hope you can explain why this is the case. Thank you very much.