tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
https://docs.tenstorrent.com/ttnn/latest/index.html
Apache License 2.0
503 stars 83 forks source link

Using sharded config with use_height_and_width_as_shard_shape=True and specific setup leads to low PCC on multiple ops #15251

Open nemanjagrujic opened 2 weeks ago

nemanjagrujic commented 2 weeks ago

Using sharded config with use_height_and_width_as_shard_shape=True and a specific setup leads to low PCC on multiple ops. Problem is observed on WH cards. Setup which cause low PCC:

input_shape [1, 25, 160, 32]
DataType.BFLOAT8_B
Layout.TILE
ShardStrategy.BLOCK
ShardOrientation.COL_MAJOR
tensor_hw_as_shard_shape True

Ops found affected:

ttnn.exp
ttnn.cos
ttnn.sin
ttnn.abs
ttnn.isfinite
ttnn.isnan
ttnn.isinf
ttnn.isneginf

There are more shapes which cause the problem. For instance:

[1, 2, 1248, 32]
[1, 2, 1472, 32]
pytest  tests/ttnn/python_api_testing/non_working_unit_tests/wormhole/test_eltwise_usehw_sharded.py

We get PCC (for instance):

(False, '-0.00014244904930987092')
umadevimcw commented 1 week ago

@nemanjagrujic For this issue i have tried with the exp op, if you use Bf8 PCC is dropping for bf16 it is passing.

nemanjagrujic commented 1 week ago

@umadevimcw In unit test tests/ttnn/python_api_testing/non_working_unit_tests/wormhole/test_eltwise_usehw_sharded.py you can find many combinations which fail. One of those is exp with bfloat16. For instance exp with:

input_shape=(1, 25, 160, 32) dtype=DataType.BFLOAT16 dlayout=Layout.TILE sharding_strategy=ShardStrategy.BLOCK shard_orientation=ShardOrientation.COL_MAJOR hw_as_shard_shape=True

gives PCC=0.7135141513261102

umadevimcw commented 5 days ago

Refer to issue #15565