tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
464 stars 73 forks source link

[Bug Report] ttnn.concat - Tile padding along concatenated dim is not supported #13667

Open chandrasekaranpradeep opened 4 weeks ago

chandrasekaranpradeep commented 4 weeks ago

Describe the bug The ttnn.concat op throws Tile padding along concatenated dim (3) not supported for concat yet (tensor: 0) error when the tuple of two input tensor of same shape (1, 32, 12, 50) are passed with concat dim = -1. The above error is thrown while validating the input tensor of the concat op in the ttnn::operations::data_movement::ConcatDeviceOperation::validate function.

For more context, here is the exact error message:

E       RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/operations/data_movement/concat/device/concat_device_operation.cpp:43: !in_ref.get_shape().has_tile_padding(this->dim)
E       info:
E       Tile padding along concatenated dim (3) not supported for concat yet (tensor: 0).

To Reproduce Run the following test:

import torch
import ttnn
from tests.ttnn.utils_for_testing import assert_with_pcc

def test_concat_rotary_embedding(device):
    concat_dim = 3
    torch_input_tensor_a = torch.rand((1, 32, 12, 50), dtype=torch.float32)
    torch_input_tensor_b = torch.rand((1, 32, 12, 50), dtype=torch.float32)
    torch_output_tensor = torch.concat([torch_input_tensor_a, torch_input_tensor_b], dim=concat_dim)

    input_tensor_a = ttnn.from_torch(torch_input_tensor_a, layout=ttnn.TILE_LAYOUT, device=device)
    input_tensor_b = ttnn.from_torch(torch_input_tensor_b, layout=ttnn.TILE_LAYOUT, device=device)

    output = ttnn.concat([input_tensor_a, input_tensor_b], dim=concat_dim)
    output = ttnn.to_torch(output)

    assert_with_pcc(torch_output_tensor, output, 0.9999)
ntarafdar commented 1 week ago

@jaykru-tt can you see if this has been covered by the concat merge you did ?

ntarafdar commented 1 week ago

@chandrasekaranpradeep spoke to @jaykru-tt , it will be fixed with his PR that he has already implemented, it is in the process of getting merged in. will close this issue when its merged in.

jvasilje commented 3 days ago

Did not hear back reason for P0 label. Setting to P1.

jvasilje commented 1 day ago

@ntarafdar let's update the Forge team on the status on this issue. :)

During the bugs meeting this morning, the Forge team reported that this is top priority for them to unblock a single model they want to push through. It's not a generality and sweep milestone.

Setting back to P0, as per Forge request.

ntarafdar commented 1 day ago

This can be cherry-picked right now in Jay's branch. He is currently trying to push this in through CI and running into perf issues (allocater). https://github.com/tenstorrent/tt-metal/pull/14306

Need to investigate if this is infra issue or issue in concat.

Let us know if cherry-picking will work in meantime, Jay will be back next Wednesday to push it in. If not we can assign someone this in the mean time

tt-mpantic commented 8 hours ago

Hi Naif, thanks for the update. At the moment we are consuming only metal main so we can't consume custom branch that you shared and we will wait for fix to land on main. I understand that it takes some time to get this in and it is fine, important part if that this is being worked on with priority. Assigning someone else at this point seems inefficient. Wednesday sounds fine.