Closed yugaoTT closed 5 days ago
To summarize what was discussed in the meeting, if the mop is changed to this:
ckernel::ckernel_template tmp(MOP_OUTER_LOOP, MOP_INNER_LOOP,TT_OP_PACR(ADDR_MOD_1, ZERO_OUTPUT_FLAG, PACK_SEL(PACKCNT), 0, MEGAROW, 0, 0));
tmp.set_end_op(TT_OP_INCADCZW(p_setadc::PAC, 0, 0, 1, 0)); // w cnt points to the next tile
tmp.program(instrn_buffer);
And the x_dim is set to the correct value in the init:
uint pack_x_dim = 8;
TT_SETADCXX(p_setadc::PAC, pack_x_dim-1, 0x0);
@yugaoTT please let me know
seems to be working now with single tile, one more change: needs to update L1 write offset, otherwise there will be gaps between the 4 packer writes
next thing to try: merge the outer loops over num_rows (8) into MOP inner loop, so we don't have any loops outside the MOP
The submodule changes are pushed here: https://github.com/tenstorrent/tt-llk-wh-b0/pull/35 https://github.com/tenstorrent/tt-llk-gs/pull/20
Please close this issue once the metal PR is also pushed
Currently pack_untilize will pack out non-tile size sticks with gaps in-between, for example, when packing out 16B sticks (8 datums bfp16 format), there will be 64B-16B = 48B gaps between those sticks. We need a new pack_untilize to pack out sticks contiguously so that there's no gaps in-between.