tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
https://docs.tenstorrent.com/ttnn/latest/index.html
Apache License 2.0
485 stars 79 forks source link

Exclude Padding from Shape Validation in Concat Operation #15308 #15329

Closed shwetankTT closed 18 hours ago

shwetankTT commented 4 days ago

Ticket

15308

Problem description

The concat operation evaluates the shapes of all input tensors to determine whether the operation can be performed. However, the LegacyShape includes padding as part of the shape computation. Padding should not be considered in shape validation for the concat operation..

What's changed

CI Link: https://github.com/tenstorrent/tt-metal/actions/runs/11953331942

Checklist

shwetankTT commented 4 days ago

@ntarafdar @sjameelTT @jaebaek and @yugi957 I have not tested this extensively. Please let me know if this PR does not make sense. There was an issue i was facing during optimization of yolov4 model which was failing due to this validation error and i decided to fix it.

jaykru-tt commented 3 days ago

Can you run models perf CI to ensure that nothing broke?

This doesn't seem to me like it should work in general with arbitrary padding for on-device concat; this op isn't padding aware yet afaik. If the inputs to concat have the same logical shapes but different padding, we would very likely mess stuff up. On the other hand, we only have padding for tiled tensors now and the padding should be the same for tensors of the same logical shape, so this should actually be okay until that changes.

I think we will eventually have arbitrary padding including for RM tensors, so we would have to revisit this then. @sjameelTT any thoughts? I know you've looked at concat before.

jaykru-tt commented 3 days ago

@shwetankTT mentioned to me that he is obtaining tiled tensors with identical logical shape but distinct padding from sharded to interleaved (assuming a tilize is thrown in as well). This is even worse than what I was worrying about above.

input-tensor --> shape=Shape([1, 1, 400[448], 256]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
MemoryConfig(memory_layout=TensorMemoryLayout::INTERLEAVED,buffer_type=BufferType::L1,shard_spec=std::nullopt)
input_tensor_2 --> shape=Shape([1, 1, 400[416], 256]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
MemoryConfig(memory_layout=TensorMemoryLayout::INTERLEAVED,buffer_type=BufferType::DRAM,shard_spec=std::nullopt)
shwetankTT commented 18 hours ago

I am not seeing this issue over the mainline. Seems like resolved. Closing this issue due same.