tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
480 stars 78 forks source link

Sub-device mesh support #14849

Open mo-tenstorrent opened 2 weeks ago

mo-tenstorrent commented 2 weeks ago

With sub-device mesh, the rule that all worker cores are part of a single op does not apply any more.

This will break post processing of the runs with sub core mesh as it asserts if it sees workers cores of the same device are part of two different ops during the op intervals.

With opID now shared with the device, we can loosen that restriction at least on sub-device mesh runs

mo-tenstorrent commented 2 weeks ago

This can affect device data aggregation. Only doing per device might not be enough #8900

mo-tenstorrent commented 4 days ago

Bumped down to P2 because with #14961 fixed, the breaking issue will be gone. We need further testing to make sure there are not other bugs.