tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
300 stars 23 forks source link

TT-metal infra rounding up page size and this is hidden from user #7614

Open tt-nshanker opened 2 months ago

tt-nshanker commented 2 months ago

@shwetankTT @tarafdarTT @yugaoTT @abhullar-tt @mywoodstock @arakhmati @jvasilje @davorchap

Please see - https://github.com/tenstorrent/tt-metal/blob/94490875229f70e42fd4004f459afca9484960cd/tt_metal/detail/util.hpp#L30C74-L30C91

This function is called when allocate buffer to determine size per bank. The page size bytes is rounded up to 32 bytes. This rounding is not represented in the tensor object. The shape and shard shape attribute of Tensor class do not contain this padding information.

Currently, in the op code of group norm and eltwise unary, we have to round up the shard width to match this padding done in the infra and get the actual shard width. I think we shouldn't be rounding buffer size of a pre-allocated sharded buffer in the op code. If we ever change the amount of rounding in the infra code, the op code will break. The padded buffer size should be part of the Tensor.

We need 32 bytes alignment for DRAM-L1 accesses and only 16 bytes alignment for L1-L1 accesses. In the SD model, for input tensor to group norm op, we set shard width to be 80 bytes as we do L1-L1 accesses only. The next eltwise unary op also uses the same sharding config. Since, the sharded tensor is in RM layout, the shard width is the page size. After rounding the page size in the above function, the actual shard width is 96 bytes but the shard width attribute is not changed and still is equal to 80 bytes.

jliangTT commented 2 months ago

what does this block @tt-nshanker ?

tt-nshanker commented 2 months ago

what does this block @tt-nshanker ? It doesn't block any work currently. For now, we have added workaround to this issue in the relevant op code (more detail in the description) and until this issue is fixed, op developers would have to continue adding the workaround to any new sharded ops. This issue is related to limitations in TT-metal "Tensor" and "Shape" objects. There are plans being developed to re-work parts of TT-metal Tensor and Shape and I believe this issue can be addressed in that re-work.

jliangTT commented 2 months ago

yes i have heard about this. That is a much bigger piece of work under planning. For now, i will lower this as P2 to make ways for other P1.