tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
459 stars 68 forks source link

[Blackhole bringup] SFPU gtests mismatching #9142

Closed abhullar-tt closed 4 months ago

abhullar-tt commented 5 months ago

Repro steps:

  1. Checkout abhullar/bh-bringup
  2. Run ./build_metal.sh
  3. Run TT_METAL_SLOW_DISPATCH_MODE=1 ./build/test/tt_metal/unit_tests --gtest_filter=SingleCoreSfpuCompute/SingleCoreSingleDeviceSfpuParameterizedFixture.*
  4. Run TT_METAL_SLOW_DISPATCH_MODE=1 ./build/test/tt_metal/unit_tests --gtest_filter=SingleCoreSfpuCompute/SingleCoreSingleDeviceSfpuParameterizedApproxFixture.*

FYI @rtawfik01

rtawfik01 commented 4 months ago

Hi, to update this issue, I see that by just using a datacopy operation, and removing the sfpu operation, that the output results are always all 0s. So far in the debug I see that the input buffers contain the correct data, and that math also produces the correct data into the destination buffer after a copy, issue seems to either be the packer, or reading back from the output. Ill keep debugging to narrow it down.

rtawfik01 commented 4 months ago

Ok the first part of the issue was the packer writing all 0s. The first fix has been merged here: https://github.com/tenstorrent/tt-metal/pull/9181. The second issue is the 24 datums of the tiles are still 0, but the rest of the tiles have correct output. Ill investigate this further, seems to still be a packer issue

rtawfik01 commented 4 months ago

It was a test issue, some operations such as log were skipped for wormhole and test parameters were not fixed for wormhole/blackhole devices. Fix is on abhullar/bh-bringup branch.

rtawfik01 commented 4 months ago

Merged here: https://github.com/tenstorrent/tt-metal/pull/9674