Interleaved to sharded does not work correctly on BH when input sticks are 16 in ROW_MAJOR_LAYOUT

mywoodstock commented 3 weeks ago

Isolated the issue to padded_offset_bytes = align(input_unit_size, input.buffer()->alignment()); in interleaved_to_sharded.

When input width is 16 (BFLOAT16, so 32b), in ROW_MAJOR_LAYOUT, the generated sharded output has alternating rows of 0s with the data, e.g.:

0: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
...

In this use case, the input unit size is 32b, and DRAM alignment on BH is 64b, so padded_offset_bytes is also 64, which is the increment value for the write ptr.

Branch: asarje/debug-halo-bh Test: pytest "tests/ttnn/unit_tests/operations/test_maxpool2d.py::test_run_max_pool[dtype=DataType.BFLOAT16-dilation=(1, 1)-stride=(2, 2)-padding=(1, 1)-kernel_size=(3, 3)-act_shape=[1, 16, 1056, 160]-device_params={'l1_small_size': 24576}]"

mywoodstock commented 1 day ago

@sjameelTT @tarafdarTT Hey guys, any update on the ETA for this?

ntarafdar commented 1 day ago

Mid next week

On Thu, Sep 26, 2024 at 15:20 Abhinav Sarje @.***> wrote:

@sjameelTT https://github.com/sjameelTT @tarafdarTT https://github.com/tarafdarTT Hey guys, any update on the ETA for this?

— Reply to this email directly, view it on GitHub https://github.com/tenstorrent/tt-metal/issues/12184#issuecomment-2378034610, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAK3IAZITLKP5NLGTXEVL7TZYSCCFAVCNFSM6AAAAABNS4QRKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZYGAZTINRRGA . You are receiving this because you were mentioned.Message ID: @.***>

tenstorrent / tt-metal

Interleaved to sharded does not work correctly on BH when input sticks are 16 in ROW_MAJOR_LAYOUT #12184