tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
471 stars 73 forks source link

Reduce BFP8 on BH not working #14804

Open ntarafdar opened 1 week ago

ntarafdar commented 1 week ago

tests/tt_eager/python_api_testing/unit_testing/misc/test_sharded.py

uncomment out the skip for the test test_sharded_reduce_h

BFP16 works but not bfp8.

bbradelTT commented 1 week ago

pytest tests/tt_eager/python_api_testing/unit_testing/misc/test_sharded.py::test_sharded_reduce_h

=============================================================== short test summary info ===============================================================
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_sharded.py::test_sharded_reduce_h[dtype0-out_sharded-in0_sharded-8]
PASSED tests/tt_eager/python_api_testing/unit_testing/misc/test_sharded.py::test_sharded_reduce_h[dtype0-out_sharded-in0_sharded-16]
FAILED tests/tt_eager/python_api_testing/unit_testing/misc/test_sharded.py::test_sharded_reduce_h[dtype1-out_sharded-in0_sharded-8] - assert False
FAILED tests/tt_eager/python_api_testing/unit_testing/misc/test_sharded.py::test_sharded_reduce_h[dtype1-out_sharded-in0_sharded-16] - assert False
====================================================== 2 failed, 2 passed, 4 warnings in 58.23s =======================================================
bbradelTT commented 1 week ago

Looks like the output values are 0. I verified that the input is non-zero. Will need to investigate further.

tests/tt_eager/python_api_testing/unit_testing/misc/test_sharded.py::test_sharded_reduce_h[dtype1-out_sharded-in0_sharded-8]                   Metal | INFO     | Initializing device 0. Program cache is NOT enabled
                 Device | INFO     | For Blackhole hardcode AICLK to 800 MHz due to lack of ARC message support
                  Metal | INFO     | AI CLK for device 0 is:   800 MHz
2024-11-06 17:45:39.174 | WARNING  | tests.tt_eager.python_api_testing.sweep_tests.comparison_funcs:get_pcc:37 - One tensor is all zero
2024-11-06 17:45:39.174 | INFO     | tests.tt_eager.python_api_testing.unit_testing.misc.test_sharded:test_sharded_reduce_h:1901 - Max ATOL Delta: 4.625, Max RTOL Delta: inf, PCC: 0.0, PCC check failed
FAILED                  Metal | INFO     | Closing device 0
                  Metal | INFO     | Disabling and clearing program cache on device 0

tests/tt_eager/python_api_testing/unit_testing/misc/test_sharded.py::test_sharded_reduce_h[dtype1-out_sharded-in0_sharded-16]                   Metal | INFO     | Initializing device 0. Program cache is NOT enabled
                 Device | INFO     | For Blackhole hardcode AICLK to 800 MHz due to lack of ARC message support
                  Metal | INFO     | AI CLK for device 0 is:   800 MHz
2024-11-06 17:45:47.162 | WARNING  | tests.tt_eager.python_api_testing.sweep_tests.comparison_funcs:get_pcc:37 - One tensor is all zero
2024-11-06 17:45:47.163 | INFO     | tests.tt_eager.python_api_testing.unit_testing.misc.test_sharded:test_sharded_reduce_h:1901 - Max ATOL Delta: 4.71875, Max RTOL Delta: inf, PCC: 0.0, PCC check failed
FAILED                  Metal | INFO     | Closing device 0