tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
https://docs.tenstorrent.com/ttnn/latest/index.html
Apache License 2.0
498 stars 83 forks source link

Reduce scatter produces incorrect output in some cases #14479

Open SeanNijjar opened 1 month ago

SeanNijjar commented 1 month ago

This configuration is used in TG Llama and results in PCC issue when running single link or hang when running multi-chip:

            (1, 1, 32, 320),  // output tensor shape
            (32, 32),           // output shard shape
            ttnn.CoreRangeSet({ttnn.CoreRange(ttnn.CoreCoord(0, 0), ttnn.CoreCoord(4, 1))}),  // shard grid
SeanNijjar commented 1 month ago

workaround available. Hit hang regressions. Fixed those, hit new PCC issues which need to be investigated by my priorities are currently elsewhere to bring up the EDM fabric for TG Llama