tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
399 stars 50 forks source link

[Bug Report] invalid acos result #6723

Open hschoi4448 opened 5 months ago

hschoi4448 commented 5 months ago

Describe the bug A clear and concise description of what the bug is.

The acos function returns an invalid value.

To Reproduce Steps to reproduce the behavior:

  1. Copy and past below code
    
    # SPDX-FileCopyrightText: © 2023 Tenstorrent Inc.

SPDX-License-Identifier: Apache-2.0

import torch import pytest import tt_lib from tests.tt_eager.python_api_testing.unit_testing.backward_ops.utility_funcs import data_gen_pt_tt, compare_results

import ttnn from tests.tt_eager.python_api_testing.sweep_tests import pytorch_ops

def data_gen_pt_tt(input_shapes, device, required_grad=False, val=1): pt_tensor = (torch.ones(input_shapes, requires_grad=required_grad) * val).bfloat16() tt_tensor = ( tt_lib.tensor.Tensor(pt_tensor, tt_lib.tensor.DataType.BFLOAT16).to(tt_lib.tensor.Layout.TILE).to(device) ) return pt_tensor, tt_tensor

@pytest.mark.parametrize( "input_shapes", ( (torch.Size([1, 1, 32, 32])), ), ) def test1(input_shapes, device): val = 90 in_data, input_tensor = data_gen_pt_tt(input_shapes, device, True, val=val)

print("input_tensor", input_tensor)

golden_tensor = pytorch_ops.acos(in_data)
tt_output_tensor_on_device = tt_lib.tensor.acos(input_tensor)

print("tt_output_tensor_on_device", tt_output_tensor_on_device)
print("golden_tensor", golden_tensor)

2. Run with pytest
```Python
input_tensor ttnn.Tensor([[[[90.00000, 90.00000,  ..., 90.00000, 90.00000],
               [90.00000, 90.00000,  ..., 90.00000, 90.00000],
               ...,
               [90.00000, 90.00000,  ..., 90.00000, 90.00000],
               [90.00000, 90.00000,  ..., 90.00000, 90.00000]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
tt_output_tensor_on_device ttnn.Tensor([[[[-70039981404865953792.00000, -70039981404865953792.00000,  ..., -70039981404865953792.00000, -70039981404865953792.00000],
               [-70039981404865953792.00000, -70039981404865953792.00000,  ..., -70039981404865953792.00000, -70039981404865953792.00000],
               ...,
               [-70039981404865953792.00000, -70039981404865953792.00000,  ..., -70039981404865953792.00000, -70039981404865953792.00000],
               [-70039981404865953792.00000, -70039981404865953792.00000,  ..., -70039981404865953792.00000, -70039981404865953792.00000]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
golden_tensor tensor([[[[nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          ...,
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan]]]], dtype=torch.bfloat16,
       grad_fn=<AcosBackward0>)
PASSED

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Please complete the following environment information:

Additional context Add any other context about the problem here.

umadevimcw commented 2 weeks ago

@hschoi4448 Fix for this issue available in this PR (due to hardware limitations nan/inf are replaced with the numbers) https://github.com/tenstorrent/tt-metal/pull/11243 Kindly review it.