tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
417 stars 54 forks source link

[Bug Report] invalid isclose result #6672

Open hschoi4448 opened 6 months ago

hschoi4448 commented 6 months ago

Describe the bug A clear and concise description of what the bug is.

The isclose function returns an invalid value.

To Reproduce Steps to reproduce the behavior:

  1. Copy and past below code
    
    # SPDX-FileCopyrightText: © 2023 Tenstorrent Inc.

SPDX-License-Identifier: Apache-2.0

import torch import pytest import tt_lib from tests.tt_eager.python_api_testing.unit_testing.backward_ops.utility_funcs import data_gen_pt_tt, compare_results

import ttnn from tests.tt_eager.python_api_testing.sweep_tests import pytorch_ops

def data_gen_pt_tt(input_shapes, device, required_grad=False, val=1): pt_tensor = (torch.ones(input_shapes, requires_grad=required_grad) * val).bfloat16() tt_tensor = ( tt_lib.tensor.Tensor(pt_tensor, tt_lib.tensor.DataType.BFLOAT16).to(tt_lib.tensor.Layout.TILE).to(device) ) return pt_tensor, tt_tensor

@pytest.mark.parametrize( "input_shapes", ( (torch.Size([1, 1, 32, 32])), ), ) @pytest.mark.parametrize( "params", ( [ (float('nan'), float('nan'), False), (float('nan'), float('nan'), True), ] ), ) def test_bw(input_shapes, params, device): val1, val2, equal_nan = params in_data, input_tensor = data_gen_pt_tt(input_shapes, device, True, val=val1) other_data, other_tensor = data_gen_pt_tt(input_shapes, device, True, val=val2)

print("input_tensor", input_tensor)
print("other_tensor", other_tensor)

rtol = 0.01
atol = 0.01

golden_tensor = pytorch_ops.isclose(in_data, other_data, rtol=rtol, atol=atol, equal_nan=equal_nan)

tt_output_tensor_on_device = tt_lib.tensor.isclose(input_tensor, other_tensor, rtol=rtol, atol=atol, equal_nan=equal_nan)

output_tensor = ttnn.to_torch(tt_output_tensor_on_device).bool()

print("tt_output_tensor_on_device", tt_output_tensor_on_device)
print("golden_tensor", golden_tensor)
print("output_tensor", output_tensor)
golden_tensor = golden_tensor
comp_pass = torch.allclose(output_tensor, golden_tensor)

assert comp_pass
2. Run with pytest
```Python
input_tensor ttnn.Tensor([[[[-nan    , -nan    ,  ..., -nan    , -nan    ],
               [-nan    , -nan    ,  ..., -nan    , -nan    ],
               ...,
               [-nan    , -nan    ,  ..., -nan    , -nan    ],
               [-nan    , -nan    ,  ..., -nan    , -nan    ]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
other_tensor ttnn.Tensor([[[[-nan    , -nan    ,  ..., -nan    , -nan    ],
               [-nan    , -nan    ,  ..., -nan    , -nan    ],
               ...,
               [-nan    , -nan    ,  ..., -nan    , -nan    ],
               [-nan    , -nan    ,  ..., -nan    , -nan    ]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
tt_output_tensor_on_device ttnn.Tensor([[[[ 1.00000,  1.00000,  ...,  1.00000,  1.00000],
               [ 1.00000,  1.00000,  ...,  1.00000,  1.00000],
               ...,
               [ 1.00000,  1.00000,  ...,  1.00000,  1.00000],
               [ 1.00000,  1.00000,  ...,  1.00000,  1.00000]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
tt_output_torch_tensor TorchTensor([[[[True, True, True,  ..., True, True, True],
               [True, True, True,  ..., True, True, True],
               [True, True, True,  ..., True, True, True],
               ...,
               [True, True, True,  ..., True, True, True],
               [True, True, True,  ..., True, True, True],
               [True, True, True,  ..., True, True, True]]]])
golden_tensor tensor([[[[False, False, False,  ..., False, False, False],
          [False, False, False,  ..., False, False, False],
          [False, False, False,  ..., False, False, False],
          ...,
          [False, False, False,  ..., False, False, False],
          [False, False, False,  ..., False, False, False],
          [False, False, False,  ..., False, False, False]]]])

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Please complete the following environment information:

Additional context Add any other context about the problem here.

VirdhatchaniKN commented 6 months ago

Hi @tt-aho , @jliangTT , @eyonland , @hschoi4448

Isclose is dependent on isnan operation, but due to hardware restrictions, we are facing issue related to nan : Discussion. Kindly share your inputs / thoughts on how we could proceed further .