tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
475 stars 75 forks source link

misc/test_softmax.py::test_softmax PCC failure with float32 #14348

Open bbradelTT opened 3 weeks ago

bbradelTT commented 3 weeks ago

Command to run: pytest tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py (there are other tests, output shown below)

or pytest tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_softmax just for this test.

PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_softmax[True-bfloat16]
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_softmax[False-bfloat16]
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_softmax_with_program_cache[True]
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_softmax_with_program_cache[False]
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_softmax_mix_precision[True-bfloat16]
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_softmax_mix_precision[True-bfloat8_b]
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_softmax_mix_precision[False-bfloat16]
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_softmax_mix_precision[False-bfloat8_b]
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_scale_mask_softmax_inplace[bfloat16-in0_DRAM-causal-64]
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_scale_mask_softmax_inplace[bfloat16-in0_DRAM-causal-384]
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_scale_mask_softmax_inplace[bfloat16-in0_DRAM-no-causal-64]
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_scale_mask_softmax_inplace[bfloat16-in0_DRAM-no-causal-384]
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_scale_mask_softmax_inplace[bfloat8_b-in0_DRAM-causal-64]
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_scale_mask_softmax_inplace[bfloat8_b-in0_DRAM-causal-384]
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_scale_mask_softmax_inplace[bfloat8_b-in0_DRAM-no-causal-64]
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_scale_mask_softmax_inplace[bfloat8_b-in0_DRAM-no-causal-384]
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_scale_mask_softmax[bfloat16-in0_DRAM]
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_scale_mask_softmax[bfloat8_b-in0_DRAM]
FAILED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_softmax[True-float] - AssertionError: FAILED: 0.7438545260666998
FAILED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_softmax[False-float] - AssertionError: FAILED: 0.7438815564487562
FAILED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_scale_mask_softmax_inplace[float32-in0_DRAM-causal-64] - AssertionError: FAILED: 0.4560745921843578
FAILED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_scale_mask_softmax_inplace[float32-in0_DRAM-causal-384] - AssertionError: FAILED: 0.29388434435907
FAILED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_scale_mask_softmax_inplace[float32-in0_DRAM-no-causal-64] - AssertionError: FAILED: 0.4226715140500547
FAILED tests/tt_eager/python_api_testing/unit_testing/misc/test_softmax.py::test_scale_mask_softmax_inplace[float32-in0_DRAM-no-causal-384] - AssertionError: FAILED: 0.2886772491603555
bbradelTT commented 1 week ago

Disabling fp32_dest_acc_en allows the tests to pass. Seems to be the same as https://github.com/tenstorrent/tt-metal/issues/14352

cc @abhullar-tt @ncvetkovicTT