tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
415 stars 53 forks source link

Interleaved to sharded does not work correctly on BH when input sticks are 16 in ROW_MAJOR_LAYOUT #12184

Open mywoodstock opened 3 weeks ago

mywoodstock commented 3 weeks ago

Isolated the issue to padded_offset_bytes = align(input_unit_size, input.buffer()->alignment()); in interleaved_to_sharded.

When input width is 16 (BFLOAT16, so 32b), in ROW_MAJOR_LAYOUT, the generated sharded output has alternating rows of 0s with the data, e.g.:

0: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
...

In this use case, the input unit size is 32b, and DRAM alignment on BH is 64b, so padded_offset_bytes is also 64, which is the increment value for the write ptr.

Branch: asarje/debug-halo-bh Test: pytest "tests/ttnn/unit_tests/operations/test_maxpool2d.py::test_run_max_pool[dtype=DataType.BFLOAT16-dilation=(1, 1)-stride=(2, 2)-padding=(1, 1)-kernel_size=(3, 3)-act_shape=[1, 16, 1056, 160]-device_params={'l1_small_size': 24576}]"

mywoodstock commented 1 day ago

@sjameelTT @tarafdarTT Hey guys, any update on the ETA for this?

ntarafdar commented 1 day ago

Mid next week

On Thu, Sep 26, 2024 at 15:20 Abhinav Sarje @.***> wrote:

@sjameelTT https://github.com/sjameelTT @tarafdarTT https://github.com/tarafdarTT Hey guys, any update on the ETA for this?

— Reply to this email directly, view it on GitHub https://github.com/tenstorrent/tt-metal/issues/12184#issuecomment-2378034610, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAK3IAZITLKP5NLGTXEVL7TZYSCCFAVCNFSM6AAAAABNS4QRKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZYGAZTINRRGA . You are receiving this because you were mentioned.Message ID: @.***>