tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
430 stars 58 forks source link

[Bug Report] Incorrect output When Using Mixed uint8 and bfloat16 Formats in Compute Kernel #12963

Open ilkoo-lee opened 2 weeks ago

ilkoo-lee commented 2 weeks ago

Describe the bug We discovered a new bug while investigating issue #11962. I found that the output results are incorrect when mixing bfloat16 and uint8 in the compute kernel while using them in the CB and DST registers.

To Reproduce branch : ilkoo/uint8_dst_reg

compute kernel : https://github.com/tenstorrent/tt-metal/blob/ilkoo/uint8_dst_reg/ttnn/cpp/ttnn/deprecated/tt_dnn/op_library/moreh_test2/kernels/moreh_test2.cpp

To reproduce the error, run the following command from the ilkoo/uint8_dst_reg branch: pytest tests/tt_eager/python_api_testing/unit_testing/misc/test_moreh_test2.py

This branch includes the changes from rd/stall_unpack_reconfig.

It does not occur when mixing float32 and uint8.

Expected behavior The expected output (PASSED) should be produced.

rdjogoTT commented 2 weeks ago

@ilkoo-lee @razorback3 After investigation and discussion with @ttmtrajkovic, this problem stems from the way uint8/int8 values are handled by the HW and placed in to Dest. I've tested a workaround on my branch rd/uint16_mask that uses uint16 data instead of uint8. This workaround is suboptimal in terms of the amount of bits being transferred, but uint8 will require more work before it is suitable for this use case. Also, the SFPU kernel had to be re-written without SPFI due to a problem with the conditional code, which I'll create an issue for as well.