Open hschoi4448 opened 7 months ago
Hi @hschoi4448 , @eyonland @tt-aho @jliangTT
for fixing this issue, we need to add some conditions in ckernal file, for wh it is located at tt_metal/hw/ckernels/wormhole_b0/metal/llk_api/llk_sfpu/ckernel_sfpu_log.h
but it is not available for gs, so the changes are need to be done in submodules.
We tried the same to add the changes in submodules and in PR creation page, those changes are not shown
Can I get some suggestions regarding this?
Hi @hschoi4448 , @eyonland @tt-aho @jliangTT for fixing this issue, we need to add some conditions in ckernal file, for wh it is located at
tt_metal/hw/ckernels/wormhole_b0/metal/llk_api/llk_sfpu/ckernel_sfpu_log.h
but it is not available for gs, so the changes are need to be done in submodules. We tried the same to add the changes in submodules and in PR creation page, those changes are not shown Can I get some suggestions regarding this?
@Aswinmcw I'm not sure what the issue you're seeing is. Hitting the link you sent I see your changes shown below
You would need to push/merge your changes in the appropriate submodule repos and then update the submodule hashes appropriately
Hi @hschoi4448 , @eyonland @tt-aho @jliangTT for fixing this issue, we need to add some conditions in ckernal file, for wh it is located at
tt_metal/hw/ckernels/wormhole_b0/metal/llk_api/llk_sfpu/ckernel_sfpu_log.h
but it is not available for gs, so the changes are need to be done in submodules. We tried the same to add the changes in submodules and in PR creation page, those changes are not shown Can I get some suggestions regarding this?@Aswinmcw I'm not sure what the issue you're seeing is. Hitting the link you sent I see your changes shown below
@tt-aho , i have changed 3 files in total, 1 in tt repo and 2 in submodules, in changes page, those 2 files in submodules is not showing, thats my doubt
Submodule changes only show the hash changes, not the files changed. You would need to click/compare the changes on the submodule repo, but I think you need to push your changes to the actual remote submodule repos first as I can't find the commits you're updating it to in them.
input_tensor ttnn.Tensor([[[[-1.00000, -1.00000, ..., -1.00000, -1.00000],
[-1.00000, -1.00000, ..., -1.00000, -1.00000],
...,
[-1.00000, -1.00000, ..., -1.00000, -1.00000],
[-1.00000, -1.00000, ..., -1.00000, -1.00000]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
tt_output_tensor_on_device ttnn.Tensor([[[[-4.37500, -4.37500, ..., -4.37500, -4.37500],
[-4.37500, -4.37500, ..., -4.37500, -4.37500],
...,
[-4.37500, -4.37500, ..., -4.37500, -4.37500],
[-4.37500, -4.37500, ..., -4.37500, -4.37500]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::BFLOAT16, layout=Layout::TILE)
golden_tensor tensor([[[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]]]], dtype=torch.bfloat16,
grad_fn=<LogBackward0>)
still has a problem. Please check again. @Aswinmcw @tt-aho @jliangTT @razorback3
hash: a059360c5f356621d37963009ccd5a8089a71432 arch: wormhole_b0
Hi @rtawfik01 , Issues happens due to -ve value hasn't been handled. Since the file is from third_party we couldn't able to handle such issues from our end and we verified locally after our change.
Is there an way to made a change and raise PR for the same?
Merged, hence closed!
@ruthreshx Can you add the description of you observation here?
Hi @umadevimcw , @eyonland , @hschoi4448 ,
The formula is:
power_log = mul (log(x), scalar)
x = -1.5469, scalar = 0.5371
With the above fix.
Step1: log(-1.5469) => NaN
step2: NaN * 0.5 = returns 255211775190703847597530955573826158592.00000 expected result is NaN
I noticed one more thing, If NaN multiplies with any decimal value from 0.0 to 0.66 it returns above bigger value.
The fix is proper only, because for the -ve value log should return NaN.
We need to create a separate ticket over there, to handle such issues from the mul_sfpu kernel. Between we will skip the pow_fractional test until the issue is been handled.
Hi @hschoi4448 , @eyonland , Please find the separate ticket for handling NaN issues in mul_sfpu. Ticket: https://github.com/tenstorrent/tt-metal/issues/12776
Current status: https://github.com/tenstorrent/tt-metal/issues/12776 Need inputs from LLK team
Describe the bug A clear and concise description of what the bug is.
The
log
function returns an invalid value.To Reproduce Steps to reproduce the behavior:
SPDX-License-Identifier: Apache-2.0
import torch import pytest import tt_lib from tests.tt_eager.python_api_testing.unit_testing.backward_ops.utility_funcs import data_gen_pt_tt, compare_results
import ttnn from tests.tt_eager.python_api_testing.sweep_tests import pytorch_ops
def data_gen_pt_tt(input_shapes, device, required_grad=False, val=1): pt_tensor = (torch.ones(input_shapes, requires_grad=required_grad) * val).bfloat16() tt_tensor = ( tt_lib.tensor.Tensor(pt_tensor, tt_lib.tensor.DataType.BFLOAT16).to(tt_lib.tensor.Layout.TILE).to(device) ) return pt_tensor, tt_tensor
@pytest.mark.parametrize( "input_shapes", ( (torch.Size([1, 1, 32, 32])), ), ) def test1(input_shapes, device): val = -1 in_data, input_tensor = data_gen_pt_tt(input_shapes, device, True, val=val)
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Please complete the following environment information:
Additional context Add any other context about the problem here.
It seems like other log operations such as
log2
andlog10
may also have similar issues.