Closed yieldthought closed 3 months ago
The output memory_config
is forced to interleaved always for scalar mul
and it's related to this issue: https://github.com/tenstorrent/tt-metal/issues/7637
The default should be the memory_config
of the input tensor, which is HEIGHT_SHARDED
in this case. The following fix will get your test to pass and also have the output be sharded:
There are two issues:
I identified what was causing the interleaved PCC issue. For sharded inputs, we need to be mindful of the shard orientation when distributing output work. What was happening was input shards are row-major and the work was being distributed col-major (which is the default).
This PR fixes that and also properly defaults output memory config to the input memory config when nothing is specified: https://github.com/tenstorrent/tt-metal/pull/9939
Side note: I picked up your test case and added it as well. I used ttnn.mul
instead of *
because I wanted to pass in the output memory config for testing.
Painfully discovered and debugged during Grok-1 bringup. The
mul
operator (and inline*
operator) do not support sharded inputs. They silently produce bad results for some of the cores. Oh, and their output is interleaved instead of respecting the input memory config and remaining sharded.Added a unit test for this to
/home/moconnor/main-metal/tests/ttnn/unit_tests/operations/test_mul.py
(test_multiply_with_scalar_sharded
) which fails on main. Available in branchyieldthought/mul_sharded_scalar
.