tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
471 stars 74 forks source link

[Bug Report] invalid ldexp backward result #6533

Closed hschoi4448 closed 7 months ago

hschoi4448 commented 8 months ago

Describe the bug A clear and concise description of what the bug is.

The ldexp_bw function returns an invalid gradient value.

import torch

input = torch.tensor([1.], device='cpu', requires_grad=True)
other = torch.tensor([0.], device='cpu', requires_grad=True)
output = torch.ldexp(input, other)

output.backward(torch.ones_like(output)) 

print(input.grad) # tensor([1.])
print(other.grad) # tensor([0.6931])

When the input is 1 and the other is 0, the correct gradient values are 1 and 0.6931, respectively. However, the TT ldexp_bw returns 0.69141 and 0.47656 as the gradient values.

std::vector<Tensor> _ldexp_bw(const Tensor& grad, const Tensor& input, const Tensor& other, const MemoryConfig& output_mem_config) {
    std::vector<Tensor> grad_tensor;
    Tensor tpow_o = mul_unary(exp(other), M_LN2);
    grad_tensor.emplace_back(tpow_o);
    Tensor result = mul(input, mul_unary(tpow_o, M_LN2));
    grad_tensor.emplace_back(result);
    return grad_tensor;
}

Furthermore, upon examining the implementation of ldexp_bw, it can be observed that it does not utilize the grad.

Moreover, I have created a pytest that can reproduce this case, and it also passes the compare_result check. This suggests that there may be an issue with the compare_result function as well.

To Reproduce Steps to reproduce the behavior:

  1. Check out branch
  2. Run 'pytest ./tests/tt_eager/python_api_testing/unit_testing/backward_ops/test_backward_ldexp.py'
    
    # SPDX-FileCopyrightText: © 2023 Tenstorrent Inc.

SPDX-License-Identifier: Apache-2.0

import torch import pytest import tt_lib from tests.tt_eager.python_api_testing.unit_testing.backward_ops.utility_funcs import data_gen_pt_tt, compare_results

def data_gen_pt_tt(input_shapes, device, required_grad=False, val=1): pt_tensor = (torch.ones(input_shapes, requires_grad=required_grad) * val).bfloat16() tt_tensor = ( tt_lib.tensor.Tensor(pt_tensor, tt_lib.tensor.DataType.BFLOAT16).to(tt_lib.tensor.Layout.TILE).to(device) ) return pt_tensor, tt_tensor

@pytest.mark.parametrize( "input_shapes", ( (torch.Size([1, 1, 32, 32])), ), ) def test_bw_ldexp(input_shapes, device): in_data, input_tensor = data_gen_pt_tt(input_shapes, device, True, val=1) other_data, other_tensor = data_gen_pt_tt(input_shapes, device, True, val=0)

grad_data, grad_tensor = data_gen_pt_tt(input_shapes, device, False, 1)

print("input_tensor", input_tensor)
print("other_tensor", other_tensor)
print("grad_tensor", grad_tensor)

tt_output_tensor_on_device = tt_lib.tensor.ldexp_bw(grad_tensor, input_tensor, other_tensor)

in_data.retain_grad()
other_data.retain_grad()

pyt_y = torch.ldexp(in_data, other_data)

pyt_y.backward(gradient=grad_data)

golden_tensor = [in_data.grad, other_data.grad]
comp_pass = compare_results(tt_output_tensor_on_device, golden_tensor)

print("tt_output_tensor_on_device", tt_output_tensor_on_device)
print("golden_tensor", golden_tensor)
assert comp_pass
3. See result
```Python

input_tensor ttnn.Tensor([[[[ 1.00000,  1.00000,  ...,  1.00000,  1.00000],
               [ 1.00000,  1.00000,  ...,  1.00000,  1.00000],
               ...,
               [ 1.00000,  1.00000,  ...,  1.00000,  1.00000],
               [ 1.00000,  1.00000,  ...,  1.00000,  1.00000]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
other_tensor ttnn.Tensor([[[[ 0.00000,  0.00000,  ...,  0.00000,  0.00000],
               [ 0.00000,  0.00000,  ...,  0.00000,  0.00000],
               ...,
               [ 0.00000,  0.00000,  ...,  0.00000,  0.00000],
               [ 0.00000,  0.00000,  ...,  0.00000,  0.00000]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
grad_tensor ttnn.Tensor([[[[ 1.00000,  1.00000,  ...,  1.00000,  1.00000],
               [ 1.00000,  1.00000,  ...,  1.00000,  1.00000],
               ...,
               [ 1.00000,  1.00000,  ...,  1.00000,  1.00000],
               [ 1.00000,  1.00000,  ...,  1.00000,  1.00000]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
tt_output_tensor_on_device [ttnn.Tensor([[[[ 0.69141,  0.69141,  ...,  0.69141,  0.69141],
               [ 0.69141,  0.69141,  ...,  0.69141,  0.69141],
               ...,
               [ 0.69141,  0.69141,  ...,  0.69141,  0.69141],
               [ 0.69141,  0.69141,  ...,  0.69141,  0.69141]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::BFLOAT16, layout=Layout::TILE), ttnn.Tensor([[[[ 0.47656,  0.47656,  ...,  0.47656,  0.47656],
               [ 0.47656,  0.47656,  ...,  0.47656,  0.47656],
               ...,
               [ 0.47656,  0.47656,  ...,  0.47656,  0.47656],
               [ 0.47656,  0.47656,  ...,  0.47656,  0.47656]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::BFLOAT16, layout=Layout::TILE)]
golden_tensor [tensor([[[[1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          ...,
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.],
          [1., 1., 1.,  ..., 1., 1., 1.]]]], dtype=torch.bfloat16), tensor([[[[0.6914, 0.6914, 0.6914,  ..., 0.6914, 0.6914, 0.6914],
          [0.6914, 0.6914, 0.6914,  ..., 0.6914, 0.6914, 0.6914],
          [0.6914, 0.6914, 0.6914,  ..., 0.6914, 0.6914, 0.6914],
          ...,
          [0.6914, 0.6914, 0.6914,  ..., 0.6914, 0.6914, 0.6914],
          [0.6914, 0.6914, 0.6914,  ..., 0.6914, 0.6914, 0.6914],
          [0.6914, 0.6914, 0.6914,  ..., 0.6914, 0.6914, 0.6914]]]],
       dtype=torch.bfloat16)]

Expected behavior A clear and concise description of what you expected to happen.

I want ldexp_bw to return the correct gradient

Screenshots If applicable, add screenshots to help explain your problem.

Please complete the following environment information:

Additional context Add any other context about the problem here.

umadevimcw commented 7 months ago

Merged to main