tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
468 stars 73 forks source link

[Bug Report] invalid tan result #6735

Open hschoi4448 opened 7 months ago

hschoi4448 commented 7 months ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. Copy and past below code
    
    # SPDX-FileCopyrightText: © 2023 Tenstorrent Inc.

SPDX-License-Identifier: Apache-2.0

import torch import pytest import tt_lib from tests.tt_eager.python_api_testing.unit_testing.backward_ops.utility_funcs import data_gen_pt_tt, compare_results

import ttnn from tests.tt_eager.python_api_testing.sweep_tests import ( pytorch_ops, tt_lib_ops ) from tests.ttnn.python_api_testing.sweep_tests import ttnn_ops

def data_gen_pt_tt(input_shapes, device, required_grad=False, val=1): pt_tensor = (torch.ones(input_shapes, requires_grad=required_grad) * val).bfloat16() tt_tensor = ( tt_lib.tensor.Tensor(pt_tensor, tt_lib.tensor.DataType.BFLOAT16).to(tt_lib.tensor.Layout.TILE).to(device) ) return pt_tensor, tt_tensor

@pytest.mark.parametrize( "input_shapes", ( (torch.Size([1, 1, 32, 32])), ), ) def test1(input_shapes, device): val = float('45') in_data, input_tensor = data_gen_pt_tt(input_shapes, device, True, val=val)

print("input_tensor", input_tensor)

golden_tensor = pytorch_ops.tan(in_data)
tt_output_tensor_on_device = ttnn.tan(input_tensor)

print("tt_output_tensor_on_device", tt_output_tensor_on_device)
print("golden_tensor", golden_tensor)
2. Run with pytest
```Python
input_tensor ttnn.Tensor([[[[45.00000, 45.00000,  ..., 45.00000, 45.00000],
               [45.00000, 45.00000,  ..., 45.00000, 45.00000],
               ...,
               [45.00000, 45.00000,  ..., 45.00000, 45.00000],
               [45.00000, 45.00000,  ..., 45.00000, 45.00000]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
tt_output_tensor_on_device ttnn.Tensor([[[[4341470040785158144.00000, 4341470040785158144.00000,  ..., 4341470040785158144.00000, 4341470040785158144.00000],
               [4341470040785158144.00000, 4341470040785158144.00000,  ..., 4341470040785158144.00000, 4341470040785158144.00000],
               ...,
               [4341470040785158144.00000, 4341470040785158144.00000,  ..., 4341470040785158144.00000, 4341470040785158144.00000],
               [4341470040785158144.00000, 4341470040785158144.00000,  ..., 4341470040785158144.00000, 4341470040785158144.00000]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
golden_tensor tensor([[[[1.6172, 1.6172, 1.6172,  ..., 1.6172, 1.6172, 1.6172],
          [1.6172, 1.6172, 1.6172,  ..., 1.6172, 1.6172, 1.6172],
          [1.6172, 1.6172, 1.6172,  ..., 1.6172, 1.6172, 1.6172],
          ...,
          [1.6172, 1.6172, 1.6172,  ..., 1.6172, 1.6172, 1.6172],
          [1.6172, 1.6172, 1.6172,  ..., 1.6172, 1.6172, 1.6172],
          [1.6172, 1.6172, 1.6172,  ..., 1.6172, 1.6172, 1.6172]]]],
       dtype=torch.bfloat16, grad_fn=<TanBackward0>)

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Please complete the following environment information:

Additional context Add any other context about the problem here.

umadevimcw commented 7 months ago

@jliangTT @hschoi4448 As mentioned in this doc of tan, it supports range from -1.45 to 1.45

image

For the inputs above this ranges requires range reduction that involves modulo operations (as shown below) which is not currently supported by GS, WHB0. Hence the accepted range is -1.45 to 1.45 Range reduction

  # Reduce x to the interval (-π/2, π/2)
    x = x % (2 * math.pi)
    if x > math.pi / 2:
        x -= math.pi
    elif x < -math.pi / 2:
        x += math.pi

@tt-aho @jliangTT and @eyonland Let me know if I can close this or proceed further on this.

eyonland commented 7 months ago

@jliangTT, I think this bug needs to be fixed within the kernel op.

jliangTT commented 7 months ago

can you clarify what the bug is here? i am reading this and cannot extract it

hschoi4448 commented 6 months ago
input_tensor ttnn.Tensor([[[[45.00000, 45.00000,  ..., 45.00000, 45.00000],
               [45.00000, 45.00000,  ..., 45.00000, 45.00000],
               ...,
               [45.00000, 45.00000,  ..., 45.00000, 45.00000],
               [45.00000, 45.00000,  ..., 45.00000, 45.00000]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
tt_output_tensor_on_device ttnn.Tensor([[[[4341470040785158144.00000, 4341470040785158144.00000,  ..., 4341470040785158144.00000, 4341470040785158144.00000],
               [4341470040785158144.00000, 4341470040785158144.00000,  ..., 4341470040785158144.00000, 4341470040785158144.00000],
               ...,
               [4341470040785158144.00000, 4341470040785158144.00000,  ..., 4341470040785158144.00000, 4341470040785158144.00000],
               [4341470040785158144.00000, 4341470040785158144.00000,  ..., 4341470040785158144.00000, 4341470040785158144.00000]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
golden_tensor tensor([[[[1.6172, 1.6172, 1.6172,  ..., 1.6172, 1.6172, 1.6172],
          [1.6172, 1.6172, 1.6172,  ..., 1.6172, 1.6172, 1.6172],
          [1.6172, 1.6172, 1.6172,  ..., 1.6172, 1.6172, 1.6172],
          ...,
          [1.6172, 1.6172, 1.6172,  ..., 1.6172, 1.6172, 1.6172],
          [1.6172, 1.6172, 1.6172,  ..., 1.6172, 1.6172, 1.6172],
          [1.6172, 1.6172, 1.6172,  ..., 1.6172, 1.6172, 1.6172]]]],
       dtype=torch.bfloat16, grad_fn=<TanBackward0>)

I believe that PyTorch tan and TT tan results should be the same. However, looking at the results above, it seems that there is a significant difference between the results of PyTorch tan and TT tan, which appears to be a bug. @jliangTT

umadevimcw commented 3 months ago

@eyonland I am moving it to todo to see if can address this issue with the help of the remainder op We will try this and if its not work out will move again to blocker status