tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
303 stars 26 forks source link

TTNN ADD OP failing on broadcasting #7872

Open jayasuryamaganuru opened 2 months ago

jayasuryamaganuru commented 2 months ago

Describe the bug TTNN add op is not performing broadcasting as anticipated. The intention is to broadcast a 2D tensor onto a 3D tensor, but it's failing due to an unsupported broadcast error.

To Reproduce Steps to reproduce the behavior:

  1. Save the below snippet to a file

    import pytest
    import torch
    import ttnn
    from tests.ttnn.utils_for_testing import assert_with_pcc
    @pytest.mark.parametrize("h", [500])
    @pytest.mark.parametrize("w", [512])
    def test_expand_and_broadcast(device, h, w):
    torch_input_tensor_a = torch.rand((8, h, w), dtype=torch.bfloat16)
    torch_input_tensor_b = torch.rand((h, w), dtype=torch.bfloat16)
    torch_output_tensor = torch.add(torch_input_tensor_a, torch_input_tensor_b)
    
    input_tensor_a = ttnn.from_torch(torch_input_tensor_a, layout=ttnn.TILE_LAYOUT, device=device)
    input_tensor_b = ttnn.from_torch(torch_input_tensor_b, layout=ttnn.TILE_LAYOUT, device=device)
    output_tensor = ttnn.add(input_tensor_a, input_tensor_b)
    output_tensor = ttnn.to_torch(output_tensor)
    
    assert_with_pcc(torch_output_tensor, output_tensor, 0.9999)
  2. Run the script using command pytest path/to/file
  3. See error on the terminal

Expected behavior input_tensor_b should get added to input_tensor_a on all 8 channels to produce an output_tensor of shape (8, h, w)

Screenshots

Screenshot 2024-04-26 at 2 13 45 PM

Please complete the following environment information:

jliangTT commented 2 months ago

@umadevimcw , is this something your team can look at since it is a eltwise op? but i am not sure.

jliangTT commented 2 months ago

@jayasuryamaganuru , what model does this block?

jayasuryamaganuru commented 2 months ago

@jliangTT functional whisper, currently the model is supported only for batch_size=1, while extending for batch_size > 1 , facing the issue

KalaivaniMCW commented 2 months ago

This fix will require both input tensors to be of same rank for broadcast

torch_input_tensor_a = torch.rand((8, h, w), dtype=torch.bfloat16)
torch_input_tensor_b = torch.rand((1, h, w), dtype=torch.bfloat16)