tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
464 stars 72 forks source link

Pytorch Sweeps tracing - ttnn.mul - Low PCC and other fails #14731

Open KalaivaniMCW opened 2 days ago

KalaivaniMCW commented 2 days ago

Following parameters failed during Pytorch Sweeps tracing tests: for ttnn.mul File: tests/sweep_framework/sweeps/eltwise/binary/multiply/mul_tensor_pytorch2.py Pytorch sweeps: tracker

Parameter |   | Error -- | -- | -- current parameter {'input_shape': {'self': [4, 12, 64, 64], 'other': [12, 1, 1]}, |   | low pcc fail- unequal ranks ? current parameter {'input_shape': {'self': [4, 16, 64, 64], 'other': [16, 1, 1]}, |   | low pcc fail- unequal ranks ? current parameter {'input_shape': {'self': [64, 3, 64, 64], 'other': [3, 1, 1]}, |   | low pcc fail- unequal ranks ? current parameter {'input_shape': {'self': [64, 4, 64, 64], 'other': [4, 1, 1]}, |   | low pcc fail- unequal ranks ? current parameter {'input_shape': {'self': [16, 1], 'other': [1, 1, 32]} |   | low pcc fail- unequal ranks ? current parameter {'input_shape': {'self': [16, 6, 64, 64], 'other': [6, 1, 1]}, |   | low pcc fail- unequal ranks ? current parameter {'input_shape': {'self': [16, 8, 64, 64], 'other': [8, 1, 1]}, |   | low pcc fail- unequal ranks ? current parameter {'input_shape': {'self': [0], 'other': 0.5}, |   | round_up: multiple must not be 0 current parameter {'input_shape': {'self': [0], 'other': []}, |   | round_up: multiple must not be 0 current parameter {'input_shape': {'self': [0, 1], 'other': [0, 1]}, |   | round_up: multiple must not be 0 current parameter {'input_shape': {'self': [1], 'other': [1]}, |   | message list(expected_pytorch_result.shape)=[1] vs list(actual_pytorch_result.shape)=[1, 1] current parameter {'input_shape': {'self': [], 'other': [0, 1]}, |   | round_up: multiple must not be 0 current parameter {'input_shape': {'self': [], 'other': [1, 1, 768]} |   | Layout issue for [ ] shape current parameter {'input_shape': {'self': [], 'other': [1, 24, 768]}, |   | same as above current parameter {'input_shape': {'self': [], 'other': [3234, 1]}, |   | same as above current parameter {'input_shape': {'self': [], 'other': [8732, 1]}, |   | same as above
KalaivaniMCW commented 1 day ago

Out of these 7 shapes for low pcc fails:

        [[4, 12, 64, 64], [12, 1, 1]], 
        [[4, 16, 64, 64], [16, 1, 1]], 
        [[64, 3, 64, 64], [3, 1, 1]], 
        [[64, 4, 64, 64], [4, 1, 1]], 
        [[16, 1], [1, 1, 32]],
        [[16, 6, 64, 64], [6, 1, 1]],
        [[16, 8, 64, 64], [8, 1, 1]],```

Six of them pass when the lesser-ranked is unsqueezed to match rank. i.e. following shapes, 
    [[4, 12, 64, 64], [1, 12, 1, 1]], 
    [[4, 16, 64, 64], [1, 16, 1, 1]], 
    [[64, 3, 64, 64], [1, 3, 1, 1]], 
    [[64, 4, 64, 64], [1, 4, 1, 1]], 
    [[16, 6, 64, 64], [1, 6, 1, 1]],
    [[16, 8, 64, 64], [1, 8, 1, 1]],
One shape gives a different error on unit test [[16, 1], [1, 1, 32]],
and it fails, even when unsqueezed i.e  [[16, 1], [1, 1, 32]], to [[1, 16, 1], [1, 1, 32]],     

On sweeps
![Image](https://github.com/user-attachments/assets/b85fe6bd-255f-4e5d-8019-a397a3887db7)

On unit tests

E RuntimeError: TT_ASSERT @ ../ttnn/cpp/ttnn/operations/eltwise/binary/device/binary_device_operation.cpp:150: height_a > height_b and height_b == 1 E info: E ttnn::operations::binary::BinaryDeviceOperation: height mismatch


![Image](https://github.com/user-attachments/assets/ea73407f-dd51-4e62-a265-229574d2b55e)
KalaivaniMCW commented 2 hours ago

PR #14803 has fixes for low PCC of 6 shapes mentioned above