Creating a tensor with sharded memory config using ttnn.create_sharded_memory_config produces the tensor with low pcc compared to input tensor when using tensor height and width as shard shape and when number of shards exceeds specified core size #15306
Describe the bug
Creating a tensor with sharded memory config using ttnn.create_sharded_memory_config with arguments:
1) use_height_and_width_as_shard_shape = True
2) input_shape = shape_of_the_input_tensor, where last two dimensions are used as shape of the shard and the remaining are collapsed into a number of shards
3) orientation = ttnn.ShardOrientation.ROW_MAJOR
creates a tensor with low PCC when compared to the original input tensor in all cases when using PCC of 1.0 as a threshold and in most of the cases when using PCC threshold 0.999. When using the sharded tensor as an input to ttnn.relu, PCC is low in all cases.
Problem is observed on Wormhole_B0.
To Reproduce
Steps to reproduce the behavior:
Checkout branch amalbasaTT/unary_sharded-sweeps-2 (soon to be merged to main)
Copy the unit test below to test_relu_sharded.py:
import torch
import random
import ttnn
import itertools
import pytest
import traceback
import math
from loguru import logger
from functools import partial
from tests.sweep_framework.sweep_utils.utils import gen_shapes, get_device_grid_size, get_sharded_config
from tests.tt_eager.python_api_testing.sweep_tests.generation_funcs import gen_func_with_cast_tt, _gen_reshape_args_from_volume
from tests.ttnn.utils_for_testing import check_with_pcc
from models.utility_functions import torch_random
Y, X = get_device_grid_size()
DEVICE_GRID_SIZE = ttnn.CoreGrid(y=Y, x=X)
Describe the bug Creating a tensor with sharded memory config using ttnn.create_sharded_memory_config with arguments: 1)
use_height_and_width_as_shard_shape = True
2)input_shape = shape_of_the_input_tensor
, where last two dimensions are used as shape of the shard and the remaining are collapsed into a number of shards 3)orientation = ttnn.ShardOrientation.ROW_MAJOR
creates a tensor with low PCC when compared to the original input tensor in all cases when using PCC of 1.0 as a threshold and in most of the cases when using PCC threshold 0.999. When using the sharded tensor as an input to ttnn.relu, PCC is low in all cases. Problem is observed on Wormhole_B0.To Reproduce Steps to reproduce the behavior:
from tests.sweep_framework.sweep_utils.utils import gen_shapes, get_device_grid_size, get_sharded_config from tests.tt_eager.python_api_testing.sweep_tests.generation_funcs import gen_func_with_cast_tt, _gen_reshape_args_from_volume from tests.ttnn.utils_for_testing import check_with_pcc from models.utility_functions import torch_random
Y, X = get_device_grid_size() DEVICE_GRID_SIZE = ttnn.CoreGrid(y=Y, x=X)
def gen_sharded_spec( gen_unsafe, num_shapes, num_core_samples, ):
def is_unsafe(num_of_shards, core_y, core_x): return num_of_shards > (core_y * core_x)
test_sweep_args = list(gen_sharded_spec(True, 4, 4))
def run_relu_sharded_tests( input_shape, dtype, dlayout, mem_cfg, data_seed, device, ): torch.manual_seed(data_seed)
@pytest.mark.parametrize( "input_shape, dtype, dlayout, mem_cfg, data_seed", (test_sweep_args), ) def test_relu_sharded(input_shape, dtype, dlayout, mem_cfg, data_seed, device): run_relu_sharded_tests(input_shape, dtype, dlayout, mem_cfg, data_seed, device)
pytest path/to/test_relu_sharded.py