tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
https://docs.tenstorrent.com/ttnn/latest/index.html
Apache License 2.0
488 stars 80 forks source link

Runtime Error: Convolution output shape mismatch issue if we pass input on device #13995

Open punithsekar opened 1 month ago

punithsekar commented 1 month ago

Describe the bug Convolution configuration, ("batch_size, output_channels,input_channels, input_height, input_width, filter_height, filter_width, stride_h, stride_w, pad_h, pad_w, use_1d_systolic_array, config_override, use_shallow_conv_variant,groups") = (1, 576, 96, 7, 7, 1, 1, 1, 1, 0, 0, True, None, False, 1). The output shape which is returned from convolution should be [1,1,49[64],576] but it returns shape [1,1,64,576] while keeping the convolution input tensor on the device. Due to this, we are not able to reshape the output to check the pcc with the torch tensor.

To Reproduce Steps to reproduce the behavior:

  1. checkout to branch, punith/conv_issue
  2. Run command, pytest tests/ttnn/unit_tests/operations/test_new_conv2d.py::test_conv_for_mobileNetV3_small_244x244

Expected behavior To get the output of shape [1,1,49[64],576] instead of [1,1,64,576].

Screenshots

tests/ttnn/unit_tests/operations/test_new_conv2d.py:180: in run_conv
    torch_output_tensor = torch_output_tensor.reshape(batch_size, out_height, out_width, torch_output_tensor.shape[-1])
ttnn/ttnn/operations/core.py:248: in __torch_function__
    return super().__torch_function__(func, types, func_args, func_kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'core.TorchTensor'>, func = <built-in method  of PyCapsule object at 0x7fd5a526f810>
types = (<class 'torch.Tensor'>,)
args = (TorchTensor([[[[  6.5938,  -1.6484,  10.8750,  ...,   2.7500,   8.0000,
                 -3.1875],
               [ 1....7109,  -0.7266,  -1.1641,  ...,  -0.5938,   0.8945,
                  0.2383]]]], dtype=torch.bfloat16), 1, 7, 7, 576)
kwargs = {}

    @classmethod
    def __torch_function__(cls, func, types, args=(), kwargs=None):
        """
        This __torch_function__ implementation wraps subclasses such that
        methods called on subclasses return a subclass instance instead of
        a ``torch.Tensor`` instance.

        One corollary to this is that you need coverage for torch.Tensor
        methods if implementing __torch_function__ for subclasses.

        We recommend always calling ``super().__torch_function__`` as the base
        case when doing the above.

        While not mandatory, we recommend making `__torch_function__` a classmethod.
        """
        if kwargs is None:
            kwargs = {}

        if not all(issubclass(cls, t) for t in types):
            return NotImplemented

        with _C.DisableTorchFunctionSubclass():
>           ret = func(*args, **kwargs)
E           RuntimeError: shape '[1, 7, 7, 576]' is invalid for input of size 36864

Please complete the following environment information:

Additional context

punithsekar commented 1 month ago

fyi @saichandax

punithsekar commented 1 month ago

https://github.com/tenstorrent/tt-metal/commit/4e1fef96b1fc5543455911fef6fcf7664f04d9b3 this commit in main made this issue.

punithsekar commented 1 month ago

@ntarafdar, there is a shape mismatch issue in the convolution output after https://github.com/tenstorrent/tt-metal/commit/4e1fef96b1fc5543455911fef6fcf7664f04d9b3 this commit in main. To whom can I assign this issue?

CC: @dvartaniansTT

punithsekar commented 3 days ago

Even in PETR model we face this issue. CC: @dvartaniansTT @mbahnasTT