Runtime Error: Convolution output shape mismatch issue if we pass input on device

punithsekar commented 1 month ago

Describe the bug Convolution configuration, ("batch_size, output_channels,input_channels, input_height, input_width, filter_height, filter_width, stride_h, stride_w, pad_h, pad_w, use_1d_systolic_array, config_override, use_shallow_conv_variant,groups") = (1, 576, 96, 7, 7, 1, 1, 1, 1, 0, 0, True, None, False, 1). The output shape which is returned from convolution should be [1,1,49[64],576] but it returns shape [1,1,64,576] while keeping the convolution input tensor on the device. Due to this, we are not able to reshape the output to check the pcc with the torch tensor.

To Reproduce Steps to reproduce the behavior:

checkout to branch, punith/conv_issue
Run command, pytest tests/ttnn/unit_tests/operations/test_new_conv2d.py::test_conv_for_mobileNetV3_small_244x244

Expected behavior To get the output of shape [1,1,49[64],576] instead of [1,1,64,576].

Screenshots

tests/ttnn/unit_tests/operations/test_new_conv2d.py:180: in run_conv
    torch_output_tensor = torch_output_tensor.reshape(batch_size, out_height, out_width, torch_output_tensor.shape[-1])
ttnn/ttnn/operations/core.py:248: in __torch_function__
    return super().__torch_function__(func, types, func_args, func_kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'core.TorchTensor'>, func = <built-in method  of PyCapsule object at 0x7fd5a526f810>
types = (<class 'torch.Tensor'>,)
args = (TorchTensor([[[[  6.5938,  -1.6484,  10.8750,  ...,   2.7500,   8.0000,
                 -3.1875],
               [ 1....7109,  -0.7266,  -1.1641,  ...,  -0.5938,   0.8945,
                  0.2383]]]], dtype=torch.bfloat16), 1, 7, 7, 576)
kwargs = {}

    @classmethod
    def __torch_function__(cls, func, types, args=(), kwargs=None):
        """
        This __torch_function__ implementation wraps subclasses such that
        methods called on subclasses return a subclass instance instead of
        a ``torch.Tensor`` instance.

        One corollary to this is that you need coverage for torch.Tensor
        methods if implementing __torch_function__ for subclasses.

        We recommend always calling ``super().__torch_function__`` as the base
        case when doing the above.

        While not mandatory, we recommend making `__torch_function__` a classmethod.
        """
        if kwargs is None:
            kwargs = {}

        if not all(issubclass(cls, t) for t in types):
            return NotImplemented

        with _C.DisableTorchFunctionSubclass():
>           ret = func(*args, **kwargs)
E           RuntimeError: shape '[1, 7, 7, 576]' is invalid for input of size 36864

Please complete the following environment information:

Device: WH-n150

Additional context

https://github.com/tenstorrent/tt-metal/commit/4e1fef96b1fc5543455911fef6fcf7664f04d9b3 this commit in main made this issue.
If the input that is passed to convolution is not kept in the device then it's passing.
The issue that we are facing is only for some input configurations.

punithsekar commented 1 month ago

fyi @saichandax

punithsekar commented 1 month ago

https://github.com/tenstorrent/tt-metal/commit/4e1fef96b1fc5543455911fef6fcf7664f04d9b3 this commit in main made this issue.

punithsekar commented 1 month ago

@ntarafdar, there is a shape mismatch issue in the convolution output after https://github.com/tenstorrent/tt-metal/commit/4e1fef96b1fc5543455911fef6fcf7664f04d9b3 this commit in main. To whom can I assign this issue?

CC: @dvartaniansTT

punithsekar commented 3 days ago

Even in PETR model we face this issue. CC: @dvartaniansTT @mbahnasTT

tenstorrent / tt-metal

Runtime Error: Convolution output shape mismatch issue if we pass input on device #13995