tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
373 stars 44 forks source link

Concat: Fails in TILE Layout #10044

Open Sudharsan-V opened 1 month ago

Sudharsan-V commented 1 month ago

Describe the bug While performing the concat operation, it is observed that PCC drops significantly when the input is in the TILE layout, whereas the operation performs as expected in the ROW_MAJOR layout.

To Reproduce Steps to reproduce the behavior:

  1. Save the below snippet to a file

    import torch
    import ttnn
    from tests.ttnn.utils_for_testing import assert_with_pcc
    def test_unit_permute(device, reset_seeds):
    tensor_a = torch.rand(1, 1, 768)
    tensor_b = torch.rand(1, 576, 768)
    torch_output = torch.cat((tensor_a, tensor_b), dim=1)
    
    tensor_a_rm = ttnn.from_torch(tensor_a, device=device, dtype=ttnn.bfloat16, layout=ttnn.ROW_MAJOR_LAYOUT)
    tensor_b_rm = ttnn.from_torch(tensor_b, device=device, dtype=ttnn.bfloat16, layout=ttnn.ROW_MAJOR_LAYOUT)
    ttnn_output = ttnn.concat((tensor_a_rm, tensor_b_rm), dim=1)
    ttnn_output = ttnn.to_torch(ttnn_output)
    assert_with_pcc(torch_output, ttnn_output, 0.99)  # PCC = 0.9999956481098589
    
    tensor_a_tile = ttnn.from_torch(tensor_a, device=device, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)
    tensor_b_tile = ttnn.from_torch(tensor_b, device=device, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)
    ttnn_output = ttnn.concat((tensor_a_tile, tensor_b_tile), dim=1)
    ttnn_output = ttnn.to_torch(ttnn_output)
    assert_with_pcc(torch_output, ttnn_output, 1)  # PCC = 0.00072778873165853
  2. Run the script using pytest <path/to/file>
  3. PCC drops when the input is in TILE layout.
vigneshkeerthivasanx commented 1 month ago

cc @saichandax @boris-drazic