tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
459 stars 68 forks source link

[Blackhole bringup] Hang in matmul tests #9145

Closed abhullar-tt closed 4 months ago

abhullar-tt commented 5 months ago

Packer is hanging on llk_push_tiles.

rtawfik01 commented 4 months ago

This hanging matmul test:

TT_METAL_SLOW_DISPATCH_MODE=1 ./build/test/tt_metal/unit_tests --gtest_filter=*MatmulLargeBlock*

runs 4 variations of the test:

  1. Tilize input, tilize output -> this passes
  2. Row major input, tilize output -> hangs
  3. Tilize input, Row major output -> mismatches
  4. Row major input, Row major output -> hangs

The row major input tests try to run tilize on device, fix for that is here: https://github.com/tenstorrent/tt-metal/pull/9700 and will be pushed soon.

Row major output tries to run untilize on device, and that requires some more fixes on blackhole, fixes will come soon.

rtawfik01 commented 4 months ago

The following matmul tests all pass with PR: https://github.com/tenstorrent/tt-metal/pull/9967

TT_METAL_SLOW_DISPATCH_MODE=1 ./build/test/tt_metal/unit_tests --gtest_filter=*MatmulLargeBlock*
TT_METAL_WATCHER=5 TT_METAL_SLOW_DISPATCH_MODE=1 ./build/test/tt_metal/test_bmm
rtawfik01 commented 4 months ago

Merged here: https://github.com/tenstorrent/tt-metal/pull/9967